


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXX November, 1939 Number 8 








THE EFFECT OF ILLUMINATION INTENSITIES UPON 
SPEED OF PERCEPTION AND UPON FATIGUE 
IN READING* 


MILES A. TINKER 


University of Minnesota 


Three important aspects of lighting are (1) quality or color, (2) 
distribution or diffusion, and (3) intensity or brightness. Interpreta- 
tions of experimental data obtained in studying the first two are in fair 
agreement. In the third field, however, Luckiesh and Moss‘ are far 
from agreeing with practically all other investigators in the interpreta- 
tion of findings on intensity of light in relation to efficient vision. In 
order that illumination standards may have a sound basis in experi- 
mental data it is highly desirable, therefore, that further investigation 
be done. 

The purpose in this experiment is to study the effect of changes in 
illumination intensity upon speed of perception and upon fatigue in 
reading. A speed of reading technique was employed to measure 
speed of perception and the “li” test was used to measure changes in 
clearness of seeing or fatigue effects. The influence upon performance 
of partial and complete adaptation of the eye to each level of brightness 
was also investigated. 

There are three parts to the experiment. In Part I speed of reading 
was measured when the eye had had only two minutes, to adapt to the 
intensity of light used. In the next part, speed of reading was meas- 
ured after the eye had been adapted for fifteen minutes to each level of 
brightness. The effect upon clearness of seeing of two hours reading 
under specific levels of brightness was investigated in Part III. The 
method will be explained as each part is discussed. 





* The expenses for the first part of this study (data in Table I) were met by a 
CWA grant, Project No. 3206; the remaining expenses by a research grant from 
the Graduate School, University of Minnesota. 


561 








The Journal of Educational Psychology 


PART I 


The light intensities employed were 0.1, 0.7, 3.1, 10.3, 17.4 and 53.3 
foot-candles. Each of these figures is an average of five determina- 
tions made with a Macbeth illuminometer. 

The experiment was conducted in an inner windowless room with 

the following set-up: Upon a table was constructed a cubical, thirty by 
thirty by thirty inches, with gray cardboard walls. The front, along 
one edge of the table, was open. In the ceiling of the cubical was a 
twenty-four-inch square piece of ground-glass. Above the ground- 
glass, in a compartment, was a bank of light bulbs that could be lit in 
the proper combinations to obtain the desired intensity. Each group 
of lighted bulbs could be centered over the middle of the panel of 
ground-glass. This arrangement insured a highly uniform distribution 
of light over the working surface on the table. It produced general 
diffused illumination of the cubical but not of the experimental room. 
(Results will be reported later in which diffused lighting of the whole 
experimental room was employed.) To change illumination bright- 
ness from one level to another it was necessary merely to plug in a 
different line leading to the light box at the top of the apparatus. 

The reading material was placed at a constant position on the table 
top underneath the light. The reader sat in front of the open side of 
the cubical with his forehead against a tape stretched from side to side. 
This assured a constant distance of about fourteen inches from eyes 
to print. 

The reading material consisted of Forms A and B of the Chapman- 
Cook Speed of Reading Test. Performance on Form B is equivalent 
to that on Form A on the average. However, since there is sometimes 
a variation from equivalence due to factors other than sampling errors’ 
it is necessary to use a control group as explained below. In each test 
form there were thirty paragraphs of thirty words each. They were 
printed in ten-point type set solid on egg-shell paper stock. There was 
a slight amount of extra space between paragraphs. A word must be 
crossed out in each paragraph to show that it has been read with under- 
standing. Thescore is paar of paragraphs read correctly in one and 
three-fourth minutes. 

Six groups of eighty-two university sophomores each were employed 
as readers. The subjects were measured one at atime. Each subject 
read Form A of the reading test as a standard under 10.3 foot-candles 

of light. Form B was then read as follows: Under 0.1 foot-candle in 








ee eee eS ee 


Bm + 





The Effect of Illumination Intensities 563 


Group I, 0.7 foot-candle in II, 3.1 foot-candles in III, 10.3 foot-candles 
in IV (control group), 17.4 foot-candles in V, and 53.3 foot-candles in 
VI. By this technique one is able to find the speed of reading under 
each level of illumination in comparison to the standard and to deter- 
mine the brightness level beyond which there is no further improve- 
ment in performance with increase in light intensity. 

Group IV is the control group in which the brightness is just the 
same for Form B as for Form A of the test. Slight variation in test 
administration is apt sometimes to upset the equivalence of the test 
forms.’ It is advisable, therefore, to include a control group in each 
experiment with conditions (light, print, etc.) constant for both Form 
Aand Form B. Thus a correction can be made in the means of the 
experimental groups for whatever deviation occurs between Form A 
and B of the control group. 

In the testing of the subjects, the first was measured under condi- 
tions for Group I, the next for Group II, etc., through Group VI. 
Then the sequence was begun over again so that after successive groups 
of six subjects there were the same number in each of the subgroups. 

In most experiments on the influence of light upon performance, 
nothing is said about the adaptation condition of the subject’s eyes. 
In Part I of this study the subject was adapted under each intensity of 
light used for two minutes prior to the measurement period. This was 
just about the time used for instructions to the subject and the practice 
exercise. It is important to know whether the variation in visual 
acuity that comes with variation in degree of adaptation is affecting 
the results. ‘The two minutes used in Part I is probably representative 
of the adaptation time allowed incidentally in many illumination 
experiments. In Part II a longer adaptation period was employed. 

Resulis.—The results for Part I are givenin Table I. Scores are in 
terms of paragraphs of thirty words each that were read correctly in 
one and three-fourth minutes. Note that in Group IV, the control 
group, the difference between means is only .12 of a paragraph. This 
is well within the variation that may be expected from chance factors’ 
and no correction need be applied to mean scores in the other groups. 

In Group I it is seen that 4.91 fewer paragraphs were read under 
0.1 foot-candle than under 10.3 foot-candles. This is practically a 
twenty-nine per cent retardation and is a highly stable difference. A 
similar trend is noted in Group II. Under 0.7 foot-candle, 1.38 fewer 
paragraphs were read than under 10.3 foot-candles. This difference of 
about eight per cent is highly stable. When 3.1 foot-candles are used 





564 The Journal of Educational Psychology 


in Group ITI, the retardation is .77 paragraphs or four and five-tenths 
per cent and is also a stable difference. Thus reading under 0.1, 0.7 
or 3.1 foot-candles is significantly slower than under 10.3 foot-candles, 


TaBLE I.—Errect or Licut INTENSITIES UPON SPEED OF READING 
(Period of Adaptation = Two Minutes) 
Differences given are for mean score on Form A minus mean score on Form B, 
Chapman-Cook Speed of Reading Test printed in ten-point type on egg-shell 
paper stock. In each test group N = 82 university sophomores (Total = 492). 











Difference 
between 
Test Foot- means in: 
group gundiien Mean | PEw PEas, | D/PEan 
Para- Per 
me | graphs | cent 
10.3 | 17.04 .30 
I 01 12.18 91 4.91 | 28.81 .22 21.94 
10.3 | 17.05 .29 
II 0.7 | 15.67 "985 1.38 8.09 .16 8.45 
10.3 | 17.10 .28 
Ill 31 16.33 98 .77 4.50 .15 4.99 
10.3 | 16.72 31 
IV 10.3 | 16.60 27 .12 an 17 0.71 
10.3 | 17.06 .30 
V 17.4 | 16.98 95 .08 47 .37 0.23 
10.3 | 17.16 31 
VI 53.3 | 17.05 95 11 .64 .18 0.62 


























The comparison in Group V shows that material is read no faster 
under 17.4 foot-candles than under 10.3 foot-candles. The .08 para- 
graphs difference may be considered a chance variation. Similarly in 
Group VI, reading is no faster under 53.3 foot-candles than under 
10.3 foot-candles. The difference of .11 is due to chance. 

In general, when the eye is adapted for only two minutes to the 
brightness of light used, speed of reading increases with increases in 
intensity up to a point which lies somewhere between 3.1 and 10.3 
foot-candles. Further increases in brightness of light produced neither 
faster nor slower reading than that found at 10.3 foot-candles. 





iso 





er 


he 


).3 
er 


The Effect of IUumination Intensities 565 


PART II 


In this part of the experiment, the apparatus for control of illumina- 
tion intensities, the reading material, levels of brightness used, and 
experimental procedure were the same as in Part I with the following 
exceptions: (1) There were seventy-two readers in each sub-group, and 
(2) the subjects were adapted for fifteen minutes to each light intensity 
before measurement began. It is usually stated that the major portion 
of retinal adaptation has taken place by fifteen to twenty minutes. If 
the results in Part II differ from those in Part I it should be due to the 
difference in adaptation time, i.e., two versus fifteen minutes. 

Results.—The results are givenin Table II. Inspection of the data 
for Subgroup IV, the control group, reveals equivalent scores for Form 


TasLe II].—Errscr or Licut INTensiT1zs UPON SPEED OF READING 
(Period of Adaptation = Fifteen Minutes) 
Differences given are for mean score on Form A minus mean score on Form B, 
Chapman-Cook Speed of Reading Test printed in ten-point type on egg-shell 
paper stock. In each test group N = 72 university sophomores (Total = 432). 











Difference 
between 
Test Foot- means in: 
group eundien Mean | PEx PEar, | D/PEaw. 
Para- Per 
graphs | cent 
10.3 | 18.68 .12 
I 01 17.03 10 1.65 8.83 .17 9.54 
10.3 | 18.56 11 
II 0.7 17.68 ‘d .88 4.74 .18 4.86 
10.3 | 17.88 .12 
III 31 17.78 09 .10 .56 .18 0.55 
10.3 | 18.40 ll 
IV 10.3 | 18.55 —.15 — .82 14 1.08 
10.3 | 18.39 .12 
V 17.4 | 18.14 iW .25 1.36 .16 1.59 
10.3 18.75 .14 
VI 53.3 | 18.88 13 —.13 — .69 .17 0.74 









































Ne ee Se ee 


ns eee ete 


566 The Journal of Educational Psychology 


A and Form B of the reading test. No correction, therefore, need be 
made in the mean scores in the other subgroups. 

Reading under 0.1 foot-candle was eight and eight-tenths per cent 
or 1.65 paragraphs slower than under 10.3 foot-candles. Similarly the 
subjects read five per cent slower under 0.7 foot-candle than under 
10.3 foot-candles. Both differences are highly stable. 

A different trend isfoundin Group III. The test material was read 
just as fast under 3.1 foot-candles as under 10.3 foot-candles. This 
trend holds in Groups V and VI which show that reading rates under 
17.4 and 53.3 foot-candles are the same as under 10.3 foot-candles. In 
other words, when light intensity is increased, reading rate increases 
from 0.1 to 3.1 foot-candles, but no more when the brightness is raised 
above the 3.1 foot-candle level. It appears, therefore, that when the 
eye is adequately adapted to the light intensity under which it is to 
work, the critical level for effective seeing is at about three foot-candles 
or slightly below. This result is to be contrasted with the data in 
Part I (two minutes adaptation) where the critical level seemed to lie 
between 3.1 and 10.3 foot-candles. The contrast in results from Parts 
I and II emphasize the importance of providing adequate visual 
adaptation to the illumination intensity under which the eye is to work. 


PART III 


In this part of the experiment the effect upon clearness of seeing 
(fatigue) of prolonged reading under various intensities of light was 
measured. The same intensities were employed as in Parts I and II. 

The ratio of clear to blurred vision (clearness of seeing) was deter- 
mined before and after reading for two hours. The clearness of seeing 
was measured by the “li” test devised by Ferree and Rand.' The 
letters ‘‘li,’’ printed in ten-point type, were placed in the center of an 
eight and one-half by eleven inch white card. This card, located ona 
slightly tilted reading stand, was illuminated with 20 foot-candles of 
well-diffused light. The general illumination of the experimental room 
was approximately 8 foot-candles. A timer was connected up with 
two keys so that pressure by the right hand started the timer and 
pressure by the left stopped it. Duration of the “li” test was three 
minutes. The subject, sat with the eyes thirty inches from the test 
object and fixated on. the “li.” The timer is started by pressing the 
right-hand key at the beginning of the test. As soon as the object 
blurred so that the separation between the 7 and its dot was no longer 
distinguishable, that is, the test object looked like two 1’s, the left-hand 
key was pressed stopping the timer. Then as soon as vision cleared so 





orm + © OF © Kk —s~ Ost @ WS 


Ss ofthe me CO fF’ ste CO OW 


> er CO Mt et et O ct et OL 





See a SS Sel 





The Effect of Illumination Intensities 567 


that the dot was seen as separated from the 7, the clock was started 
again. In this manner the test was continued for the three minutes. 
The score yielded is number of seconds of blurred and number of 
seconds of clear vision during the three-minute period. Although the 
reliability of the “‘li” test has been questioned,” Ferree and Rand! have 
demonstrated that it is highly reliable if the subject is given sufficient 
practice to stabilize his responses. 

Two women, university graduates, served as subjects. Ferree and 
Rand’s procedure’ for conducting the test was adhered to. The head 
of the subject was maintained in a constant position. Directions were 
given to assure that the subject maintained a definite set, that she 
clearly understood the task, and that she did not blink more than was 
absolutely necessary. Each subject was given on successive days the 
three-minute ‘‘li” test several times a day with a one-hour interval 
between. At the end of about six days the responses had stabilized so 
that the amount of time devoted to clear and to blurred vision became 
approximately constant from test to test. 

What does the “li” test measure? Obviously it measures the 
degree of blurring which occurs during three minutes of exacting visual 
effort. It is immaterial whether one calls it visual fatigue or keeps to 
the more descriptive term “clear seeing.”’ In any case the “‘li’’ test 
is a sensitive indicator of the extent to which the eye is able to maintain 
clear seeing during visual effort. If the loss in ability to see clearly is 
greater after working under some brightness levels than under others, 
presumably the greater loss is due to ineffective lighting. The greater 
the loss, the more serious is the ‘‘fatiguing’”’ effect of work under that 
particular light intensity. 

After achieving stability of response in the ‘‘li” test the experiment 
proper was begun. Precautions were taken not to use the subjects on 
days when they were tired, felt eye strains, etc. On each of six day 
the subject was given the “li” test after fifteen minutes’ adaptation 
to the light in the experimental room. This was followed by two hours 
of continuous reading. The Saturday Evening Post which is printed in 
10-point type constituted the reading material. At the end of the 
two-hour period the “li” test was again given. The 10.3 foot-candle 
illumination was used on the first day. In the succeeding periods the 
order of light intensities used was random and varied from one subject 
to the other. Thus for one subject it was 10.3, 17.4, 0.7, 0.1, 3.1, 53.3 
foot-candles. 

The results will be given for each subject separately. Those for 
Subject No. 1 are shown in Table III. When 0.1 foot-candle was 





568 The Journal of Educational Psychology 


TaBie III.—MsasvuremMent oF CLEAR SEEING 
(Period of Adaptation = Fifteen Minutes) 
Subject No. 1: Effect of light intensities upon clear seeing (fatigue) by two 
hours’ reading. 











Foot- Blurring eiemannan Ratio: Clear} Per cent of 

candles | measured Clear Blurred to blurred | time blurred 
0.1 Before 120 60 2.00 33 
: After 44 136 .32 76 
0.7 Before 113 67 1.70 37 
. After 69 1ll .62 62 
3.1 Before 121 59 2.10 32 
' After 125 55 2.30 31 
10.3 Before 94 86 1.10 47 
j After 97 83 1.20 46 
17.4 Before 97 83 1.20 46 
: After 96 84 1.10 47 
53.3 Before 99 81 1.20 46 
. After 84 96 .88 53 




















employed, there was a marked increase in the amount of blurring after 
two hours of reading. In fact the per cent of time blurred rose from 
thirty-three before reading to seventy-six after reading. Thus the 
ratio of clear to blurred before reading is 2.00 and after reading, only 
0.32. A similar trend is noted when reading under 0.7 foot-candle 
intensity. Per cent of time blurred before reading is thirty-seven, 
after reading sixty-two. The corresponding ratios of clear to blurred 
are 1.70 and 0.62, respectively. Obviously the two hours’ reading 
under these two levels of brightness increased markedly the blurring of 
vision during the “‘li” test. 

Reading under the next three higher intensities reveals a quite 
different trend. For 3.1 foot-candles the per cent of time blurred 
before reading is thirty-two, and after reading thirty-one; for 10.3 foot- 
candles, forty-seven and forty-six; for 17.4 foot-candles, forty-six and 
forty-seven. At these three levels there was no increased blurring of 
vision from two hours of reading. At 53.3 foot-candles, the highest 
level of brightness employed, however, there was again a slight tend- 





— = +l - _—_ 


> Ss ew 


The Effect of Illumination Intensities 569 


ency for increased blurring after two hours of reading. It is possible 
that brightness glare in this particular set-up was beginning to disturb 
visual efficiency. In any case the effect was small. A fair judgment 
would be that clearness of seeing was little altered by two hours’ work 
under the 53.3 foot-candles. 

The results on Subject No. 2 are given in Table IV. Examination 
of the data in comparison with those in Table III reveals a remarkable 
TaBLE IV.—MEAsUREMENT OF CLEAR SEEING 
(Period of Adaptation = Fifteen Minutes) 

Subject No. 2: Effect of light intensities upon clear seeing (fatigue) by two 
hours’ reading. 











Foot- | Blurring Tans he eoteney Ratio: Clear} Per cent of 

candles measured a Bl ; to blurred | time blurred 
0.1 Before 149 31 4.81 17 
. After 122 58 2.10 32 
0.7 Before 146 34 4.29 19 
: After 137 43 3.19 24 
3.1 Before 150 30 5.00 17 
: After 150 30 5.00 17 
10.3 Before 145 35 4.14 19 
p After 150 30 5.00 17 
17.4 Before 147 33 4.45 17 
’ After 150 30 5.00 17 
53.3 Before 148 32 4.63 18 
: After 132 48 2.75 27 




















correspondence between the trends for Subject No. 1 and No. 2. 
There was an increased percentage of blurring from two hours reading 
under 0.1 and 0.7 foot-candle of light. Then for the 3.1, 10.3, and 
17.4 foot-candles there was little or no change from the clearness of 
seeing prior to the reading period. Finally, there was a slight increase 
in the blurring for the 53.3 foot-candles. . 

The results in this part of the experiment indicate that the critical 
level of illumination intensity for reading ten-point type is approxi- 
mately 3 foot-candles. With prolonged work at lower intensities, 
clearness of seeing critical details is markedly reduced. _With intensi- 





elec eoae eee pas 







































— tor eS ier i 
= cme 2 = mgs ee 3 SS — = 


BSS 


570 The Journal of Educational Psychology 


ties ranging from 3.1 to 53.3 foot-candles, clearness of seeing is little 
affected by prolonged visual work. 


DISCUSSION 


It is obvious from comparing the results in Parts I and II of this 
experiment that the adaptation of the eye must be controlled in study- 
ing the effect of illumination intensity upon visual efficiency. The eye 
should be given sufficient time to adapt itself to the illumination under 
which the visual work is to be done. Interpretation of results from 
experiments in which adaptation of the eye has not been controlled 
may be questioned as to validity. 

In Part II and Part III the eye was given adequate adaptation to 
the illumination used. Speed of reading was measured in Part II and 
clearness of vision in Part III. In these the trend of measured effects 
coincides almost exactly although two quite different types of measure- 
ment wereemployed. In both, efficiency of performance was adversely 
affected by intensities below 3.1 foot-candles. Also performance was 
equally efficient under 3.1, 10.3, 17.4, and 53.3 foot-candles in both. 
These data indicate, therefore, that the critical intensity level is about 
3 foot-candles for reading ten-point type. There are other results® 
that tend to substantiate these findings. 

The experiments of Luckiesh and Moss have led them to recom- 
mend*:* what seem to be excessively high light intensities for visual 
tasks. For example, they state that 20 to 50 foot-candles should be 
employed for ordinary reading. Re-examination®® of their data, 
however, suggests that their conclusions are not justified by their 
findings. 

What illumination intensities should be employed to assure hygienic 
vision during reading and comparable tasks? ‘Tinker*® has surveyed 
the available experimental material. Although all indications suggest 
that the critical level for reading 10- to 12-point type is around 3 to 
4 foot-candles (most certainly this critical level is below 10 foot- 
candles) it is suggested that any specifications made should provide a 
margin of safety. The conclusion is that 10 to 15 foot-candles should 
provide hygienic conditions when one’s eyes are normal and print is 
legible. Fine discriminations require 20 to 25 foot-candles® for 
adequate vision. 


SUMMARY AND CONCLUSIONS 


(1) The effect of illumination intensities (1) upon speed of reading 
with two minutes’ adaptation to each light used; (2) upon speed of 





i) 


ee em 2 OE BS ase « oo 


— pet 





\e 


—a Ee Ss WwW FF S&S 


1 Qa 


y 
8 


1g 
of 





The Effect of Illumination Intensities 571 


reading with fifteen minutes’ adaptation to each light used; and (3) 
upon clearness of seeing by means of the “li” test was measured. 

(2) Intensities of light employed were 0.1, 0.7, 3.1, 10.3, 17.4, and 
53.3 foot-candles. 

(3) With two minutes’ adaptation, intensities below 10.3 foot- 
candles retarded significantly speed of reading. The rate of reading 
was the same for 10.3, 17.4, and 53.3 foot-candles. 

(4) With fifteen minutes’ adaptation intensities below 3.1 foot- 
candles retarded significantly speed of reading. The rate was the same 
for 3.1 foot-candles and above. 

(5) Clearness of seeing as measured by the “‘li’”’ test was reduced 
by two hours of reading under intensities of less than 3.1 foot-candles. 
Clearness of seeing was little disturbed, however, from reading two 
hours under 3.1, 10.3, 17.4 and 53.3 foot-candles, although there was a 
slight decrease at the 53.3 level. 

(6) When adequate adaptation was allowed, the findings by both 
methods are in agreement. They indicate that the critical level of 
illumination for reading ten-point type is approximately 3 foot-candles. 
That is, speed of reading is not increased and clearness of seeing 
(fatigue) after two hours reading is not significantly changed when the 
intensity is raised above 3.1 foot-candles. 

(7) Since a margin of safety is desirable, it is suggested that 10 to 
15 foot-candles be employed when reading this sized type or slightly 
larger type. 


REFERENCES 


1. Ferree, C. E., and Rand G.: “An investigation of the reliability of the “li” 
test.” Trans. Illum. Eng. Soc., Vol. xxu, 1927, pp. 52-75. 

. Luckiesh, M., Cobb, P. W., and Moss, F. K.: “‘ An investigation of the reliability 
of the “li” test.” Trans. Ium. Eng. Soc., Vol. xxu1, 1927, pp. 43-51. 

3. Luckiesh, M., and Moss, F. K.: “A correlation between illumination intensity 
and nervous muscular tension resulting from visual effort.’”’ J. Ezper. 
Psychol., Vol. xv1, 1933, pp. 540-555. 

. Luckiesh, M., and Moss, F. K.: The Science of Seeing. New York: D. Van 
Nostrand Co., 1937. 

5. Tinker, M. A.: “Cautions concerning illumination intensities for reading.” 
Amer. J. Optom., Vol. x11, 1935, pp. 43-51. 

. Tinker, M. A.:.“‘Illumination standards for effective and comfortable vision.” 
J. Consult. Psychol., Vo.1 111, 1939, pp. 11-20. 

. Tinker, M. A., and Paterson, D. G.: “‘Studies of typographical factors influenc- 
ing speed of reading. XIII. Methodological considerations.” J. Appl. 
Psychol., Vol. xx, 1936, pp. 132-145. 


i) 


oe 


z= & 








Ag 


ase preheat 4 — — o: : “ A ieee —< . = 
= Oe = eS = Se a Ss See = = = 
a . . * PSA .e + de te a rr 
Sphere ¥ Otis - pcre SEs. = Se ee a ee a = “ 
a ax! 9 See Sn SE ERB 5 : So ht nd 
BESS SON RT a STII 9. ag TOIT E AA AE AE PO SS. 0S BERN Se = 


TWIN SIMILARITIES IN PHOTOGRAPHIC MEASURES 
OF EYE MOVEMENTS WHILE READING PROSE 


DAVID H. MORGAN* 
Institute of Child Welfare, University of California 


A. INTRODUCTION 


Early studies which demonstrated individual differences in eye 
movements during reading'* led to the theory that eye movements are 
dependent upon motor habits. The progressions, regressions, and 
fixations of eye movements for an individual were found to exhibit 
a certain pattern which characterized his reading for that type of 
material. This pattern was said to be determined by the individual’s 
“habit” in reading. From this ‘“‘habit” concept, a further theory 
emerged that the adequate training of eye movements provides a 
means for the development of efficient reading, with the corollary 
that certain forms of reading disability may be mitigated or eliminated 
by appropriate retraining of faulty eye movements. !!-!2-15 

An opposed point of view is represented by Tinker,'* who main- 
tained that faulty eye movements are the outcome, not the cause, of 
poor reading. The improvement in reading shown by certain cases 
after motor training, was attributed by him to increased motivation 
and to widened recognition span, rather than to any direct effect of 
improved eye movements upon reading proficiency. 

Because of the finding that more than one fixation per word occurs 
occasionally in the plotted records of good readers, Sisson" criticised 
the advocates of the “‘habit’’ concept for assuming that good reading 
requires rhythmical fixations evenly spaced along the line. He con- 
cluded from a review of the literature and from his own results that 
eye movements are not the basis of reading ability, but are “mainly 
expressions of underlying processes of assimilation.’”’ In a later 
report!? he examined quantitatively the characteristics assumed in 
Dearborn’s theory of reading as a “short-lived motor habit’”—wz., 
a rhythmical series of the same number of pauses per line, and a par- 
ticular pattern of fixations. Neither of these characteristics having 
been found to be a consistent differentiating factor between good and 
poor readers, Sisson concluded that the theory appears to be useless. 


* Assistance in the preparation of these materials was furnished by the personnel 
of Works Progress Administration. Official Project Number 465-03-3-61, under 
the supervision of J. J. Rulon. 





572 































0! 


eos = el SS 


oe = 


1 2, & @ es” et eee 


Mm mre as et 





i ee, i el 


a -—- Ww -m | 


wee 


el 
or 





Twin Similarities in Measures of Eye Movements 573 


Since it has been definitely established that reading can be improved 
and that individual differences exist in reading ability, the concepts 
of “habit” and of ‘‘underlying processes of assimilation’? may be 
complementary rather than antagonistic. The habits of a reader 
may depend, to a certain degree, upon his underlying processes of 
assimilation, rather than upon chance or external determining factors. 
The goal of improvement for the individual would then be those 
reading habits most efficient for his own assimilative capabilities, 
rather than certain ideal motor sequences which are thought to char- 
acterize good readers in general. 

One approach to the study of the above hypothesis is the method 
used so frequently in analyses of individual differences; namely, a 
comparison of genetically identical twins with twins of the fraternal 
type who present greater genetic differences.'° Jones and Wilson® 
have demonstrated that there is an ‘‘environmental factor which is 
greater for the fraternal than for the identical twins” and that ‘“‘its 
existence should no longer be disregarded in nature-nurture studies 
employing the twin method.” However, there is little basis for the 
assumption that greater similarity in the environment of the identical 
twins would in any direct way tend to produce greater similarity in 
such a function as eye movements. It is scarcely conceivable that 
an individual can copy, consciously or unconsciously, the eye-move- 
ment habits of another, since these movements are so inaccessible 
to ordinary observation. Of course, if the identical pairs of twins are 
in the same grade while the fraternal pairs are in different grades, 
the greater similarity of reading material, instruction, etc., for the 
identicals may lead to more similar methods of reading. 

The purpose of this study is to utilize the twin method in an 
investigation of the réle of habit in eye movements in reading prose.* 
If eye movements are entirely due to habit, then the resemblances 
between identical twin pairs, excluding the effect of similarities in 
sex, age, intelligence, and reading age, should be no greater than those 
between unrelated pairs. On the other hand, if eye movements are 


. dependent entirely upon underlying processes of assimilation, then 





* This study was carried out as part of a general twin study at the Institute of 
Child Welfare under the direction of Dr. Harold E. Jones. The collection of data 
was made possible through codperation of the school systems in Alameda, Berkeley, 
Hayward, Oakland, and Richmond. The writer is indebted to Drs. Luther C, 
Gilbert and Herbert 8S. Conrad for criticism of the study and to Dr. Harold D. 
Carter for assistance in classification of the twins. 





574 The Journal of Educational Psychology 


the relationships among twin pairs should suggest a basic similarity 
comparable to that reported in studies of physical characteristics 
and mental abilities. 


B. THE APPARATUS 


The Ophthalmograph.—The machine used for photographing the 
eye movements was the Ophthalmograph, distributed by the American 
Optical Company. Light from two standard automobile lamps, style 
2, 6-8 volts, 21 candle power, is reflected from the cornea of the eye 
of the subject so that it strikes the focusing telescopes. These tele- 
scopes can be adjusted in or out for focusing the amount of light to 
be received on the film and can be rotated in all directions for locating 
the reflected light. The reading material, illuminated by the lamps, is 
located under the telescopes. The line of vision is under the telescopes 
between the lamps. The light is reflected from the eye down the 
telescopes to the film in the camera-box back of the reading material, 
thereby recording the movement of both eyes upon that film. For 
focusing the light, a shutter-opener automatically brings a mirror 
into position to reflect the rays of light upon a glass. The light is in 
focus when it falls upon the point of intersection of two vertical lines 
and one horizontal line. Thirty-five mm. Dupont Super Panchro- 
matic film was used in this study. The machine has a stationary 
forehead rest and adjustable chin and head rests. Head movements 
are reduced by these devices, but not always eliminated. The adjust- 
able rests facilitate the focusing of the reflected light. 

The reading material is covered during focusing by a card on which 
are printed instructions for the subject. These instructions were 
also incorporated in the oral directions given by the experimenter. 
Detailed descriptions with illustrations have been published 
elsewhere. ® 

Reading Material—Four cards containing fifty words each of 
simple prose were used. Since the first of these served merely for 
practice, the scores for this card are not included in the analysis of 
the data. The first two cards contained eight lines, the last two 
cards, seven. On all of the cards the lines were three and five-eights 
inches long. 


C. PROCEDURE 


Administration of the Testing Program.—When the eye-movement 
records were taken, the subject was, in general, at ease, interested, and 





will 





Twin Similarities in Measures of Eye Movements 575 


willing to codperate. With the subject seated and the head and chin 
rests adjusted, the examiner announced: 


“Now I am going to take some motion pictures of your eyes while you 
read. I will explain to you just what you are todo. ‘Do you see the X here?’ 
(Examiner pointed to center of card cover.) I want you to look at that until 
I get your eyes in focus. Then I shall start the camera and say to you, ‘Look 
at this O.’ (Examiner pointed to the O on the card cover which marked the 
left end of the line.) And then, ‘Look at this O.’ (The examiner pointed to 
the O marking the right end of the line.) And then, ‘Read.’ (Examiner 
lifted the card cover for an instant showing the reading material below.) 
Now I want you to keep as still as possible. Don’t move your hands, feet, 
shoulders, or head. You know what happens, don’t you, when someone 
moves when you are taking a picture? I want you to read silently and to 
yourself. Don’t read out loud, and don’t move your lips while you are reading. 
When you have finished, close your eyes so that I shall know when you are 
through. Do you understand what you are to do?” 


Occasionally it was necessary to repeat the first part of the direc- 
tions. The procedure was then followed as outlined in the directions 
given. 


Despite the instructions to read silently, some individuals read the first 
card aloud. In this event, the machine was stopped, and the child was 
allowed to finish the card without realizing that he was not being photo- 
graphed. After the subject had finished reading the card, the tester smiled 
and said: 

“That was fine! Now, the next cards we are going to do differently. 
Don’t read these out loud and don’t even move your lips.” 

Other subjects failed to close their eyes on finishing, and reread the card. 
They were told: 

“When you have finished, close your eyes so that I shall know that you 
are through. When you have finished one line, go to a new one.” 

The four cards were placed in the card-holder so that a card which had 
been read could easily be slipped out and a new one exposed. The cover was 
lowered, after each reading, and the light from the eyes refocused, since 
the subject usually moved his head upon closing his eyes. After the com- 


pletion of the reading, the subject was questioned briefly in order to check 
comprehension. 


Measures of Reading Performance.*—For computation of the 
scores, prints were made from the negatives, and the scores were 





* Rate of reading, progressions, initial regressions, and regressions within the 
line are also considered in the writer’s unpublished doctor’s thesis.* 











576 The Journal of Educational Psychology 


read directly from the prints. The first and last lines of each carg 
were not considered. Since the speed of the machine was one-half 
an inch per second, or seventy-five cm. per minute, the reading time 
was obtained by multiplying the length of the eye-movement record 
in centimeters by the reciprocal of seventy-five, which is .0131. 

Regressions, backward movements of the eyes to obtain materia] 
which has been overlooked or incompletely comprehended, were 
counted and doubled to give the number of regressions per hundred 
words. 

Fixations, or pauses of the eye to read, were obtained by adding 
the number of progressions, or forward movements of the eyes, per 
hundred words to the number of regressions per hundred words. 
Although the nature of a fixation is different from that of a progression 
or of a regression, there can be no new fixation without a progression 
or a regression. The number of fixations, therefore, equals the sum 
of the progressions and the regressions. 

Average pause duration, or average time per fixation, was obtained 
from the division of the total time by the number of fixations. A 
slight error is introduced by this method, of course, since the eyes are 
in motion approximately six per cent of the time.” 


D. THE SAMPLE 


Twin Groups.—lIn all, two hundred four cases, or one hundred two 
pairs of twins, were photographed by the writer. The diagnosis of 
twin type was made by Dr. Harold D. Carter of the University of 
California. The data considered for the diagnosis were color of hair 
and eyes, general physical characteristics, shape of ears, and finger 
prints.* No cases of doubtful type appear in this study. 

Artificial Pairs.—A comparison group was formed from a group of 
two hundred adolescents each of whom was photographed during the 
Spring and Fall of 1935 with the same Ophthalmograph while read- 
ing the same cards used in the twin study. This group forms the 
nucleus of an intensive nasal study, conducted by the Institute 
of Child Welfare, University of California, on various phases of 
adolescence. From this group the ‘artificial twins’’ were formed by 
equating twenty pairs of girls and twenty pairs of boys in terms of 
chronological age, intelligence quotients, and reading age. All of the 
group attended the same school and were within one-half grade of 





* Techniques of classification used were those worked out by Siemens, Newman, 
Dahlberg, etc., and now in general use. 








Twin Similarities in Measures of Eye Movements 577 


each other. The pairs were equated within one-tenth year in chrono- 
logical age. } 

Since two tests records were available on the Terman Group Test, 
that test was selected as the measure of intelligence for equating the 
two groups. The average intelligence quotient of the two adminis- 
trations was used as the most reliable measure of intelligence. The 
product-moment coefficient of correlation between the scores on 
the two administrations of the test was found to be .904. Since the 
average of the two scores was used, the reliability of the measuring 
instrument, computed by the Spearman-Brown prophecy formula, 
was .950. The standard error of estimating a ‘‘true” score from a raw 
score was found to equal 5.49. The pairs were equated within five 
points in IQ, a little less than one sigma of estimating a true IQ from 
an obtained one.® 

Forms V and W of the Stanford Reading Test were each available 
for two administrations, one preceding and one following the eye- 
movement study. The average score at the four testings was taken 
as the most reliable reading quotient. The correlation between the 
two forms at the first and second administrations was .913, at the 
third and fourth administrations, .931; the correlation between 
average V and average W was .922. The reliability of the average 
score for the four testings, by the Spearman-Brown formula, was .960, 
with the standard error of the estimated ‘‘true”’ average score equal to 
5.12. The pairs in the comparison group were equated within five 
points on average reading score, less than one standard error of esti- 
mating a true score from an obtained one. 

Twins Excluded from the Study.—Over thirty-one per cent of the 
twins tested were, for various reasons, excluded from the study. 
Among six per cent of the pairs tested, at least one member of the pair 
had less than twenty per cent visual acuity in one or both eyes. Three 
sets were lost from the fraternal group because of excessively low 
intelligence. 

Almost fifty per cent of the cases lost (approximately fourteen per 
cent of the cases tested) were excluded because of illegible records. 
Some records were lost because of head movements (a number of the 
subjects had head colds at the time of the examination). Other 
records were impossible to read, because the eye-lids of the subject 
narrowed to such an extent that the beam of light to the camera was 
cut before the subject had finished reading the selection. In other 
cases the beam was cut near the end of the paragraph by extremely 








578 The Journal of Educational Psychology 



































long eyelashes. Approximately sixteen per cent of the pairs had one » 
member who did not read the complete paragraph on one or more of t] 
the three cards. These incomplete records were consequently . 
excluded from the study. Whereas more boys than girls of both twin ‘ 
types were tested, more than twice as many boys as girls were excluded 
from the study for various reasons. - 

In the majority of the identical pairs excluded, both members of the ee 
pair were found to have defective records; among the fraternal twin ti 
pairs, with few exceptions, exclusions were based on defective records fc 
for one member only. The effect of this differential selection is, of ti 
course, to diminish the difference between the intra-pair r’s in the th 
identical and fraternal groups. 

Description of the Sample.—The identical group is, on the average, ts 
more than half a year younger than the fraternal group; the variability - 


in age of this group is slightly larger than that of the fraternal (Table th 
I). The artificial pairs are older than the twin groups with a con- 





























siderably smaller spread in age. The standards adopted in the equat- s 
ing necessitated large samples of approximately the same age. Since os 
it was not feasible, because of time and of cost, to collect a sample of ei 
the various age levels in the twin groups, available records were used. lo 
Even from a group of two hundred within a narrow age range, only pu 
forty pairs of artificial twins could be obtained. hij 
Tasie I.—Curono.oaica AGE aND Grapg CoMPOsITION OF THE THREE Groups © 
A 
CA Grade 
Number een 
Com) sow. lapis Range 
Artificial pairs............... 40 13.95 .89 | .... | High 8-Low 9 
Fraternal twins.............. 33 13.01 | 2.16 | 7.48 | Low 4-High 10 
Identical twins............... 35 12.33 | 2.38 | 6.78 | High 3—Low 10 “aii 
Art 
The fraternal group is, on the average, higher in school grade than Fra 
the identical group, as one would expect from the slight difference in = 
average age. The range in grade is approximately the same for the 
twin groups. In all except five sets of fraternal and two of identical 
twins, both members of the set were in the same grade. The dif- con 
ference for one set of fraternal twins was one grade and for the remain- the 
ing sets of both groups one-half a grade. The close similarity in grade ide 


placement may be an indication that, in the selection of the total 





an 
in 
the 
cal 


n- 
ade 
ytal 





Twin Similarities in Measures of Eye Movements 579 


sample, some cases of marked differences in grade, particularly in 
the fraternal group, were not located. As one would expect from the 
small variability in chronological age, the range in grade of the 
artificial pairs was only one-half year. 

Since the intelligence test records were obtained from different 
schools, the type of test used was naturally different. However, the 
records are sufficiently comparable to be valid for a general descrip- 
tion of the nature of the sample. The means and the mean differences 
for the various types of tests are reported in Tables II and III, respec- 
tively. In only two cases was the type of test different for members of 
the same twin pair. 

The identical group as a whole is of approximately the same mean 
intelligence as the fraternal group; namely, 96.3 and 95.0 in IQ, 
respectively. The average intelligence of both groups approximates 
that of the population as a whole, but is‘slightly below the school 
population. The fact that the sample is below the school population 
in intelligence does not mean that this group of twins is an inferior 
sample of twins. Byrns? reports, in a study of one hundred eighty- 
eight sets of twins, that the median intelligence of like-sex twins was 
lower than the median of fifty-nine thousand five hundred fifty-nine 
pupils. The mean intelligence of the artificial sets is considerably 
higher than that of either of the twin groups. 


TaBLE II.—INTELLIGENCE QUOTIENTS OF THE THREE Groups CLASSIFIED AS TO 
Typrgz or Test 





Kuhl- Terman 



































menn-An-| Group yore Otis | All tests 
inet 
derson Test 
N | Mean | N | Mean N | Mean | N | Mean | N | Mean 
Artificial pairs.......... See, 80/113 .0 Ye ee or eee 
Fraternal twins......... 26|104.2 | 24) 88.7 8 85.8 | 2 | 88.0 60) 95.0 
Identical twins......... 36| 97.7 | 13} 90.9 | 14] 97.3 | 2 | 98.5 | 65) 96.3 











In mean difference between pairs in intelligence, both groups are 
comparable to the samples of other studies. The mean is higher for 
the identical twins used in this study than for the complete sample of 
identical twins tested. On the other hand the fraternal twins finally 
retained are more alike than the fraternal sample tested.. 





580 





The Journal of Educational Psychology 


TaBie III.—Mean Intra-pair DIFFERENCES IN INTELLIGENCE QUOTIENTS FoR 
THE THREE Groups CuAssIFIED as TO TyPE oF TEST 





Kuhl- 
mann-An- 
derson 


Terman 
Group 
Test 


Stanford- 
Binet 


Otis 


All tests 





N | Mean| N | Mean! N | Mean} N | Mean! N | Mean 








Artificial pairs.......... ee ee €  ha et en, BS 
Fraternal twins......... 14/ 11.0] 12} 8.9|3] 7.3/1] 10.0] 30) 9.8 
Identical twins......... 18} 5.8| 6 5.4/7] 6.3/1] 5.0| 32] 5.8 
































The mean difference in intelligence of the artificial sets is small 
because of the method of pairing. This difference is only a little more 
than one-half the standard error of estimate. 


E. RESULTS 


Reliability of Measures.—Eurich‘ reports: ‘“‘ . . . Measures of eye 
movements are usually reliable, since for single paragraphs which on 
the average take less than one minute to read, the coefficients of 
reliability for fixations and regressions vary between .70 and .87, 
while for three paragraphs combined the reliability coefficient is above 
.90.”” The reliabilities of the measures used in this study (Table IV) 
agree with the findings of Eurich. These reliabilities were obtained 
from the average of three cards, one hundred fifty words. As one 
would expect, a slight improvement in the measures of reading, for 
the age range in this sample, is shown as the individual advances in 
age. Regressions exhibit more improvement with an advance in 
age than do either of the other two measures. Because of the fairly 
low correlations of age with the measures, the reliability coefficients 
are little changed when age is held constant. 

TaBLe IV.—ReELiABILITY COEFFICIENTS, CORRELATIONS WITH AGE, AND 


RELIABILITY COEFFICIENTS WITH AGE CONSTANT, OF THE MEASURES OF 
EYE MovemEnts—ComBINED Twin Groups 























| Obtained r with | Partial 
N = 96 r PE, age r 
Regressions per one hundred words.......... 91 .011 | —.33 | .87 
Fixations per one hundred words............ .95 .007 | —.15 | .94 
Average pause duration.................... .94 008 | —.15 93 











ca a ve, roe 2 


nal 
the 
hati 











Twin Similarities in Measures of Eye Movements 581 


Analysis of Data.—Because of defective or illegible records on 
individual cards, the number of cases on the different cards varies. 
The performance of the three groups will, therefore, be analyzed for 
the individual cards as well as for the average of the three, since 
greater faith can be placed in interpretations, when the results are 
observed to be uniform in spite of a somewhat changing sample. 

Table V presents statistical constants for the measures of eye 
movements for the three groups. The artificial pairs and the fraternal 
twins are fairly comparable in average number of regressive movements 
made per one hundred words, but the fraternal twins make more 
fixations and take longer for each. A comparison of the performance 
of the identical twins with that of the other two groups shows that the 
identical twins make more fixations and spend more time on each 
fixation, than either the fraternal twins or the artificial pairs. In most 
cases the means of the identical twins are significantly higher than 
those of the other two groups. 


TaBLE V.—MgANS AND STANDARD DEVIATIONS OF THE DISTRIBUTIONS FOR 
REGRESSIONS PER ONE Hunprep Worps, FrxaTions per Ons HuNnpRED 
Worps, AND AVERAGE Pause DuraTION—ARTIFICIAL Pairs, FRATERNAL 

TwINns, AND IDENTICAL TWINS 









































Artificial Fraternal Identical 
pairs twins twins 
Card—| Ai Bi Ci;\|A|BiC|AlsLBIC 
N = —| 80 | 80 | 68 | 64 | 54 | 48] 66 | 58) 60 
Regressions per one hundred words 
is cuales cela an deeait 14.6)16.2)13.6/15.1/12.9)13.9) 19.0)18.3)18.4 
de Rae eR. 8.4| 7.9| 8.5| 8.5] 7.6| 9.7) 11.8111.3111.0 
Fizations per one hundred words 
DEL sc oweeside dbo ob¥suver $4. 2)82.9179.9)02. 8184. 3186. 5|101.6)99.8197.3 
Ee ee 14.3)14.9,16.1/21.4/18.6|23.2) 22.6/21.5'21.6 
Average pause duration in seconds 
Edt ae b'a pce vet wee ke . 223) . 236) . 240) . 225) .244) .245).251 | .263).283 
EE EE ST (024 029|.029| 033 .043) .042).045 |.055).050 





In the description of the sample, it was pointed out that the frater- 
nal twins in this sample are younger than the artificial pairs and that 
the identical twins are younger than the fraternal. One would 
naturally expect the condition shown in Table V; the younger group 
is more immature in measures of eye movements than the older group. 





582 The Journal of Educational Psychology 


The intra-class coefficients of correlation’ for the artificial pairs 
(Table VI) for regressions and for fixations are, in most cases, less 
than two probable errors of the coefficient. For average pause dura- 
tion the r’s are higher, but still not significantly different from zero, 
The relationships for the fraternal twin pairs are higher throughout 
than those for the artificial pairs. The lowest r’s, between one and 
two probable errors of the coefficient, are for regressions. The identical 
twins show a higher intra-pair relationship than either of the other 
two groups, the r’s for this group ranging from .40 to .72. 


‘TaBLE VI.—InrTRa-ciass COEFFICIENTS OF CORRELATIONS FOR REGRESSIONS 
PER OnE HunpRED Worps, FixaTIons PER ONE HuNnpDRED WorbDs, AND 
AVERAGE Pauss DuraTIONS—ARTIFICIAL Pairs, FRATERNAL TWINs, 

AND IDENTICAL TWINs 


























Artificial Fraternal Identical 
pairs twins twins 
Card—| A | B | C |Av. A| BIC lAv.| A] BI C lAv. 
N =-—| 80 | 80 | 68 | 68| 64| 54| 48| 44/| 66/ 58/ 60| 50 
Regressions 
a a .04| — .20) — .01] .04 10.19 .21] 24). 49]. 54/60) .66 
REE erates 09} .08| .09}.09].09}.10).11].11|.07].07|.07|.06 
Fizations 
Rio ES Be eae —.01)/—.02} .14|.15|.42|.72/.28).46].44|.40|.56].61 
a ieee aie 09} .09) "09! 091 08] 08]. 101.091.071.071 071.07 
Average pause duration 
Wt ec cncan ake ce 14, .19} .35).24).51].40).14].53].69| 56) .52!.72 
nip pate gee aera 09 .08 090.0709). -09} .05).07).07).05 





























A convenient device for interpreting an obtained coefficient of cor- 
relation is the index of forecasting efficiency,’ E = 100(1 — 7/1 — r’) 
(Table VII). Although the predictive values are small, they are 


Tasie VII.—Inpex or ForEcASTING EFFICIENCY FOR INTRA-CLASS COEFFICIENTS 
or CORRELATION—AVERAGES OF THREE CARDS 












Regressions per} Fixations per 
one hundred | one hundred ee 
words words 
lL ieWeana bak on 0 1 3 
bb hues od oie 3 ll 15 
Pelcnewe ke ok wea 25 21 31 


























ol 





Twin Similarities in Measures of Eye Movements 583 


higher in every case for the fraternal twins than the artificial pairs, and 
substantially higher for the identical twins than for either of the other 
two groups. 

Composite Measure.—For a further analysis of the intra-pair 
relationships in the twin groups a composite measure was Cetermined 
by averaging the sigma score values for average pause duration, 
regressions, and fixations. The sigma scores were computed for the 
two types of twins combined. All of the twin pairs were used, by 
taking the average of the cards read. In terms of the composite 
measure (Table VIII), both groups show a substantial twin-resem- 
blance with the similarity of the identical twins considerably greater 
than that of the fraternal twins. 


TasBLze VIII.— -PAIR RBLATIONSHIPS OF FRATERNAL AND IDENTICAL TWwINs 
ON THE AVERAGE SiGMA-sScORE COMPOSITE OF REGRESSIONS, FIXATIONs, 
AND AVERAGE Pauss DvuRaTION 








N r PE, 
Fraternal twins............... 66 .40 .08 
Identical twins................ 70 .61 06 














F. DISCUSSION 


The results of this study indicate more fundamental bases for 
individual differences in eye movements in reading than are recognized 
by the advocates of the “habit” concept. The intra-pair relationships 
cannot readily be attributed to similarities in the environment of the 
twin pairs. Also the data from the artificial pairs indicate that these 
relationships are not dependent, to any great extent, upon similarities 
in age, intelligence quotient, or reading age. The ‘‘underlying proc- 
esses of assimilation” constitute a more logical explanation of the 
results in this sample. 

To the extent that the correlation among identical twins exceeds 
that for fraternal twins, and the correlation among fraternal twins 
exceeds that for artificial pairs, we may infer that hereditary factors 
play a significant réle in the various measurements. On this basis, it 
appears that average pause duration is more largely determined by 
hereditary factors than the other two measures considered in this 
study. The inference is that, in average pause duration, training 
designed to develop certain desired habits of eye movements would be 





584 The Journal of Educational Psychology 


less effective. Although the r’s for fixations are slightly lower than 
for average pause duration, there is also evidence, in this measure, of 
the effect of heredity. | 

Our inference from the correlations in the three groups (identicals, 
fraternals, and artificial pairs) is that regressions should respond more to 
training than either of the other two measures. The advocates of the 
“habit” concept can argue that, since regressions are not essential 
to reading, they are either the result of bad habits or insufficiency of 
vocabulary, relevant information, etc. However, the high relation- 
ships, .49 to .66, found between the identical twins, would seem to 
indicate that even regressions are dependent upon factors other 
than environment and training. 

In conclusion, this study provides support for the view that eye 
movements are primarily an outcome or correlate, rather than the 
cause, of poor reading. In the treatment of poor reading, censidera- 
tion should be given to the capabilities of the individual, rather than 
exclusively to certain ideal motor sequences of reading. 


G. SUMMARY 


Previous studies of eye movements in reading have led to the 
formulation of conflicting theories concerning the réle of habit in 
eye movements. The present study has utilized the twin method 
as a means of appraising the importance of habit factors in eye move- 
ments during the reading of simple prose. 

The sample consisted of thirty-three pairs of fraternal twins, 
thirty-five pairs of identical twins, and forty “‘artificial pairs”’ (from 
a population of single-born) equated on the basis of CA, IQ, reading 
age, grade position, and socio-economic factors. 

The eye-movement records were taken with the Ophthalmograph, 
which is distributed by the American Optical Company. Four cards 
containing fifty words each of simple prose were used for the reading 
material, the first being used as a practice card. The measures of 
eye movements used were regressions per hundred words, fixations per 
hundred words, and average pause duration. 

(1) In terms of intelligence quotients and of mean difference in 
intelligence, the two twin groups were comparable to twins used in 
other studies. The artificial pairs were higher in intelligence and 
had a smaller intra-pair difference than either of the twin groups. 

(2) The twin groups ranged in grade from the third to the tenth 
and had an average age between twelve and thirteen years. The 








Ss —|> Ne ww 


= 


e 


Twin Similarities in Measures of Eye Movements 585 


equated pairs were, on the average, fourteen years old and were all 
in the last semester of the eighth grade or the first of the ninth. 

(3) The reliability coefficients of the three measures, age constant, 
ranged from .87 to .94. 

(4) The fraternal twins and the artificial pairs were fairly com- 
parable in terms of mean performance on the various measures, but 
the identical twins, on the average, had longer pause durations and 
made more regressions and more fixations than either of the other 
two groups. 

(5) The intra-pair coefficients of correlation for the artificial 
pairs were all low, less than 4 PE of the coefficients. The highest 
coefficients were those for average pause duration. The relationships 
between the fraternal pairs were significantly different from zero for 
fixations per hundred words and for average pause duration. The r’s 
in this group for regressions per hundred words were between one and 
two probable errors of the coefficients. The identical twins showed 
significant relationships, higher than those of either of the other two 
groups, on all of the measures. The range of r’s for this group was 
from .40 to .72. 

(6) When the r’s for the averages of the three cards were trans- 
lated in terms of E, the index of forecasting efficiency, the values of E 
for the artificial pairs ranged from zero to three, for the fraternal 
group from three to fifteen, and for the identical group from twenty- 
one to thirty-one. 

(7) The intra-pair correlation for the fraternal group on the average 
sigma-score composite of the three measures was .40, and that for 
the identical twins was .61. 

(8) The findings of this study are in harmony with the statement 
that faulty eye-movement patterns are an outcome or correlate, rather 
than the cause, of reading disabilities. The ‘‘habit”’ concept, as an 
exclusive explanation, appears untenable in the light of results from 
this study. 

(9) The goal of remedial reading should be the development of 
the individual in respect to his capabilities, rather than the formal 
development of certain eye-movement patterns which have been 
found to accompany efficient reading. 


BIBLIOGRAPHY 


1, Buswell, G. T.: Fundamental Reading Habits. Supplementary Educational 
Monographs, No. 21, University of Chicago Press, Chicago, 1920, 150 pp. 














586 


ofp Ne 


10. 


11. 


12. 


13. 
14, 
15. 
16. 


17. 


The Journal of Educational Psychology 


Byrrns, R. K.: “Mental abilities of twins.” School and Society, Vol. xu, 
November 17, 1934, pp. 671-672. 


. Dearborn, W. F.: The Psychology of Reading. Columbia University Contribu- 


tions to Philosophy and Psychology, Vol. xrv, No. 1, 1906. 


. Eurich, A. C.: ‘‘Fourth- and fifth-grade standards for photographic eye- 


movement records.” Pedagogical Seminary and Journal of Genetic Psychol- 
ogy, Vol. xu11, December, 1933, pp. 466-471. 

Fisher, R. A.: Statistical Methods for Research Workers. Oliver and Boyd, 
London, Second edition, 1928, 269 pp. 

Garrett, H. E.: Statistics in Psychology and Education. Longmans, Green and 
Co., New York, Revised edition, 1937, 493 pp. 

Hull, Clark. Aptitude Testing. World Book Co., Yonkers-on-Hudson, 
1928, 535 pp. ) 

Jones, H. E., and Wilson, P. T.: “Reputation differences in like-sex twins.” 
Journal of Experimental Education, Vol. 1, December, 1932, pp. 86-91. 

Morgan, D. H.: “ Motor factors in reading: a cinemanalysis of eye movements 
of identical and like-sex fraternal twins in reading prose.’”’ Unpublished 
Doctor’s Thesis, University of California, 1938. 

Newman, H. H., Freeman, F. N., and Holzinger, K. J.: Twins: A Study of 
Heredity and Environment. University Press, Chicago, 1937, 369 pp. 

Pollock, M. C., and Pressey, L. C.: “An investigation of the mechanical habits 
in good and poor readers.”” Educational Research Bulletin, Vol. tv, Septem- 
ber 23, 1925, 273-275. 

Robinson, F. P.: The Réle of Habit in Eye Movements in Reading, with an 
Evaluation of Techniques for their Improvement. University of Iowa Studies, 
No. 39, 1933. 

Sisson, E. D.: ‘‘ Habits of eye movements in reading.”” Journal of Educational 
Psychology, Vol. xxv111, September, 1937, pp. 437-450. 

: “The réle of habit in eye movements in reading.”” The Psychological 
Record, Vol. 1, August, 1937, pp. 159-168. 

Taylor, E. A.: Controlled Reading. University of Chicago Press, Chicago, 
1937, 367 pp. 

Tinker, M. A.: “‘The rdle of eye movements in diagnostic and remedial 
reading.” School and Society, Vol. xxxrx, February 3, 1934, pp. 147-148. 
and Frandsen, A.: “Evaluation of photograph measures of reading.” 

Journal of Educational Psychology, Vol. xxv, February, 1934, pp. 96-100. 








fre 
sc 
ite 
ng 
sh 
fe 


aes 





RESPONSES OF A GROUP OF GIFTED CHILDREN 
TO THE PRESSEY INTEREST-ATTITUDE TEST 


ROBERT L. THORNDIKE 


Teachers College, Columbia University 


In 1933 Pressey and Pressey? reported a new personality test, an 
outgrowth of the X-O test, which they called an Interest-Attitude 
Test. This test was made up of items chosen to be discriminating 
on the basis of maturity. There were four subtests, the first having 
to do with things thought wrong, the second with things worried about, 
the third with things interested in, and the fourth with qualities 
admired in people. In the first test, the individual was instructed to 
check the items which he considered wrong, and to double check those 
that he thought were very wrong. The same procedure was followed 
in the other subtests. Each subtest consisted of ninety items. 

Each item of the test was selected to show a progressive change in 
frequency of checking as one went from elementary-school to high- 
school to college groups. With the exception of a few ambiguous 
items, the item-by-tem norms* show this to be the case. Unfortu- 
nately, it was not possible to get an equal number of items that 
showed ascending and descending curves. Most of the items showed 
fewer checks with increasing maturity. This was so for eighty items 
in Tests 1 and 2, sixty in Test 3, and forty in Test 4. The possibility 
of individual scores being to a considerable extent a function of a 
mere general ‘‘tendency to check items indiscriminately” has been 
discussed elsewhere.* 

The Pressey Interest-Attitude Tests were administered in October, 
1938, to a group of twenty-five gifted boys and twenty-four gifted 
girls in Speyer School, P.S. 500, New York City. The Speyer School 
experiment has already been described,! so we will not elaborate upon 
it here. The group consisted of children whose Stanford-Binet IQ’s 
were, with one or two exceptions, 130 or over. Their ages, at the time 
of testing, ranged from nine and two-tenths to twelve and four-tenths 
years, with a median of eleven years. The median IQ was 141. In 
Tables I and II will be found the age, IQ, and estimated mental age 
of each child. The intelligence tests were administered in 1936, so 





*Item-by-item norms were made available through the courtesy of 8. L. 
Pressey. 


587 














588 The Journal of Educational Psychology 


the present mental age is an estimate based on IQ rather than a direct 
measurement. 

The first step was to determine for each child the total score and 
the score on each subtest of the Pressey Test. These were converted 
into age equivalents by means of Pressey’s norms. These age equiva- 
lents for total score and for each subtest are included in Tables I and 
II. An examination of these scores indicates, in the first place, that 


TaBLE I.—PreEssey TEST 











Boys 
eetiensd Subtest scores 
Chrono- Intelli- | Mental age| age, total | Emotional 
No. |logicalage| gence 10/1/38 score quotient 
10/1/38 | quotient 10/10/38 Wecne Wor- Inter- Traits 
5 ries ests 

Beutaen 10.1 155 15.7 14.0 139 11.9 | 13.2 | 14.5 | 16.0 
Re 12.2 138 16.8 16.4 134 14.7 | 21.0 | 16.2 | 14.3 
ea vues . bee 146 17.1 16.5 141 22+ | 12.7 | 14.5 | 22+ 
Gindias 10.3 150 15.5 14.4 140 *| 16.6 | 18.7 9.7 | 14.7 
Dccimane 9.6 138 13.2 13.9 145 10.5 | 14.7 | 12.2 18.7 
Meéuced 11.4 194+ 22.1+ 19.3 169 20.7 | 16.7 | 18.3 | 22.0 
Wissess 10.4 147 15.3 12.1 116 12.5 | 11.3 | 16.0 | 10.7 
— ee 10.2 120 12.2 11.7 115 16.8 | 12.2 8.7 | 13.2 
Beseses 12.4 148 18.4 21.5 173 22+ | 19.7 | 22.0 | 19.0 
Wr sessed 10.2 140 14.3 12.6 124 8.6 | 17.0 | 10.1 | 21.5 
ns anal 9.8 163 16.0 18.5 189 16.6 | 19.5 | 18.7 | 18.7 
Siesses 11.7 139 16.3 16.6 142 14.3 | 18.3 | 16.7 | 16.7 
) ae ee 9.7 170 16.5 16.9 174 12.3 | 18.7 | 19.3 | 18.3 
6, stan 10.4 163 17.0 14.2 137 10.3 | 16.2 | 17.3 | 14.0 
Bs wewud 10.9 144 15.7 18.9 173 16.8 | 19.7 | 18.7 | 20.0 
BBs ick 10.0 142 14.2 10.6 106 11.5 8.5 8.0 | 22+ 
Be catend 11.4 137 15.6 14.6 128 11.7 | 14.5 | 17.0 | 15.3 
SS: 12.1 145 17.5 19.5 161 22+ | 19.7 | 12.6 | 22+ 
WO diibcw 12.4 146 18.1 17.7 143 22+ | 18.7 | 12.4] 13.8 
a 11.0 175 19.3 14.0 127 16.0 | 13.6 | 11.7 | 14.3 
Eiiveesen 11.2 140 15.7 14.7 131 19.5 | 14.7 | 16.0 | 11.6 
SB isiniai 11.8 138 16.3 11.7 99 10.0 | 11.5 | 12.6 | 15.7 
Pe 11.1 158 17.5 20.4 184 19.5 | 19.7 | 17.7 | 22+ 
re 10.6 158 16.7 15.6 147 17.5 | 16.7 | 11.7 | 16.5 
BBs satin 10.2 130 13.3 tae | 128 10.8 | 16.7 9.9 | 16.0 
Median. 10.9 146 16.0 14.7 140 16.0 | 16.7 | 14.5 | 16.5 
































these children are well in advance of their years in their performance 
on this test. The twenty-five boys, with median CA of 10.9, median 
IQ of 146, and median estimated MA of 16.0, have a median “‘emo- 
tional age,” to use Pressey’s tentative term, of 14.7 and a median 
“emotional quotient” of 140. The twenty-four girls, with median 
CA of 11.1, median IQ of 138, and median estimated MA of 15.3, 


aah © B&O 


oo «2 


= eS ot Fr 





Responses to the Pressey Interest-Attitude Test 589 


have a median “emotional age” of 14.1 and a median “emotional 
quotient” of 131. The age equivalent on this test falls much closer to 
mental age than to chronological age. Whatever the test as a whole 
measures appears to be very closely related to abstract intelligence, or, 


conversely, bright children seem to be decidedly advanced in whatever 
the test measures. 


TasBLe II.—Presszy Txstr 














Girls 
Emotional Subtest scores 
Chrono- | Intelli- | Mental age| age, total | Emotional 
No. | logical age} gence | 10/1/38 | score | quotient 
est 
iiseai 9.2 184+ | 16.9+ | 14.7 160 | 15.4 | 13.7 | 14.4] 13.2 
“ore 10.8 138 14.9 12.0 111 | 10.0 | 13.7] 9.4] 12.1 
Riind 11.2 149 16.7 15.4 137 | 17.8 | 13.2 | 14.4] 17.7 
ree 10.3 171 17.6 13.6 132 | 12.5 | 12.2 | 13.4 | 15.7 
Si 12.1 141 17.1 21.1 174 | 22+ | 20.0] 14.8 | 224 
6...4. 9.8 143 14.0 11.8 120 | 11.1 | 13.1 | below| 17.3 
. 
Cieuied 11.2 135 15.1 15.4 138 | 16.2 | 21.0 | 10.2 | 17.5 
Rent 9.9 131 13.0 10.3 104 | 9.0|below| 13.5 | 14.2 
. 
“A 9.3 200+ | 18.64+ |° 13.1 141 | 14.0] 11.9] 9.4| 17.5 
eh. 9.9 140 13.9 13.0 142 | 10.2 | 18.5 | 10.8 | 11.2 
Patina 11.5 134 15.4 14.1 123 | 9.3| 19.0] 16.0 | 13.2 
een 11.4 138 15.7 15.8 139 | 17.0 | 18.0 | 13.9 | 14.4 
m.,5.5. 11.9 132 15.7 13.9 117 | 11.7 | 11.6] 15.0 | 18.0 
Wisin 10.8 132 14.3 10.7 99 | below| 11.6 | 12.5 | 12.9 
8 
_ eRe 10.1 138 13.9 14.0 139 | 11.7 | 13.8] 14.6 | 13.8 
16... 11.9 131 15.6 14.8 124 | 12.7 | 21.0 | 12.7 | 15.7 
ae 11.2 156 17.5 13.7 122 | 10.4] 14.4] 12.0 | 16.7 
eriesi 10.5 172 18.1 14.2 135 | 15.0 | 13.7 | 14.0 | 12.9 
Miss, 11.1 130 14.4 16.2 146 | 18.3 | 18.0 | 14.4 | 14.0 
Mi ..04. 11.2 124 13.9 14.5 130 | 12.5 | 19.5 | 10.0 | 17.7 
Cebs 11.2 137 15.3 14.6 130 | 14.5 | 15.0 | 14.8 | 13.4 
isinid 11.2 152 17.0 13.0 116 | 10.3 | 14.0 | 10.0 | 14.4 
ris: 9.2 142 13.1 9.8 106 | below| below| 9.0 | 16.7 
vie 
issnr 12.0 120 14.4 16.9 140 | 14.7 | 20.0 | 17.3 | 16.3 
Median} 11.1 138 15.3 14.1 131 | 12.5 | 13.9 | 13.5 | 15.1 





























Consideration of the scores on the four subtests shows that the 
maturity of the gifted children is greatest in Subtest 4—traits which 
they admire in others, and Subtest 2—worries and fears. The 
maturity is least marked in Subtest 1—things considered wrong, and 
in Subtest 3— interests. The differences between either 4 or 2 and 



















































i ’ = -: = = 


* 4 Sais <5, Z S owe - — r. s _ 
in hase, BS io ep - ee Sia e —— ~ 
. » . oS * ul 4 3 SF as * So -Soaee we ane aaa oat 
Se eon ag te 4 a . + c= - > 6 a oe 
* J - ~— . Lge oo!" gone ee gy ON a ye. see ee ety ae 
eS se ti af ons ee aoe See. 


= » Samat 


7 
- ae 


590 The Journal of Educational Psychology 


either 1 or 3 were statistically reliable. The boys were more mature 
than the girls in each subtest, and most conspicuously so in the judg- 
ments of wrong. It is interesting that in this group the superiority 
of the boys to the girls on the Pressey Test corresponded almost 
exactly to their superiority on the Stanford-Binet. 

The tests were next subjected to an item-by-item analysis. The 
number of checks recorded on each item by the total group of boys was 
determined, and this was multiplied by the appropriate factor to 
give the number of checks per hundred subjects. (This is the form 
in which Pressey’s item-by-item norms are presented.) The same 
procedure was followed for the girls. It was then possible, by com- 
paring our gifted groups with Pressey’s norms, to assign an approxi- 
mate grade equivalent to the group response on each item. For some 
of the items, the norms did not show clear and unequivocal change as 
one went up through the grades. These items could not be used in the 
detailed analysis. 

The grade equivalent for any single item is very unreliably deter- 
mined. However, it may be possible that the general character of the 
items upon which the group achieve very mature placement will give 
us some light upon the special areas of maturity in the gifted child. 
To cut down somewhat the number of items that creep into our list 
just because of the vagaries of sampling, we will present only those 
items which receive a very mature placement (or a very immature one) 
for both boys and girls. .A very mature placement is considered to 
be one where the frequency of checking in the gifted group places 
them at or beyond the level of high-school juniors. A very immature 
placement is defined as one at or below the sixth-grade level. This 
may still be above the chronological age of our group, but is well 
below their mental-age level. 

Table III lists the items from Subtests 1, 2, and 3 for which the 
group responded at a very mature or very immature level. Items 
from Subtest 4 are not included. It was found that on this test a 
great many checks were given to all items, rather indiscriminately. 
The group seemed to have a very low “checking threshold.” The 
result was that upon most of those items where a high score indicated 
maturity our group reached the high-school or college level, while 
upon most of those items where a high score indicated immaturity 
our group fell at or below the sixth-grade level. In general, it seemed 
that this group tended to check a great many admired traits, very 
few fears, and a moderate number of interests and things considered 
wrong. 





Responses to the Pressey Interest-Attitude Test 591 


TaBLE III.—Very Mature AND Very IMMATURE RESPONSES 


MATURITY 


Shown by infrequent checking of: 
Freak 


Reading novels 
Day-dreaming 
Poker 

Tobacco 
Bashfulness 
Being shabby 
Playing cards 
Giggling 


Giggling 
Lightning 
Storms 
Craziness 
Medicine 
Sins 


Smoking 


Bribery 
Being conceited 


Immodesty 
Lawlessness 


Beaches 
Magazines 


As wrong 


As feared or 
worried about 


As wrong 


IMMATURITY 


Fighting 
Speeding 
Flirting 
Peddling 
Screaming 
Begging 
Gang 
Strike 
Being clumsy 
Slickness 
Shouting 


Yelling 


Drawing 
Cartoonist 
Engineers 
Carnival 
Bicycling 
Baseball 

Medals 
Collecting stamps 
Prizes 

Baseball player 
Roller skating 
Picture puzzles 
Scrap books 
Story writing 
Red Cross work 
Geography games 


Cash 
Clothes 
Money 


Shown by frequent checking of: 


As wrong 


As liked or 
interested in 


As feared or 
worried about 








The Journal of Educational Psychology 


TaBuE III.—Continued 
MartTurRiITr ImMMATURITY 
Shown by frequent checkirg of: Shown by infrequent checking of: 
Fiction 
Day-dreaming As liked or Auto driving As liked or 
Doctors interested in interested in 
Photography 
Professors 
Science 
Social affairs 


It is not possible to draw anything more than suggestions from 
Table III. We tentatively offer the following generalizations: 

(1) Things Considered Wrong.—This group has a mature disregard 
for conventional prohibitions on such things as playing cards, smoking, 
reading novels, dancing, etc., but tends to be severe in its judgment of 
personal antisocial behavior, whether it be in the childish form of 
yelling and screaming or in the mature form of bribery and lawlessness. 

(2) Things Worried About.—This group seems to report a fairly 
general and unspecialized lack of fears and worries, both childish and 
adult. They have sloughed off most of the fears of childhood to 
about the extent of the junior or senior high-school student, without 
as yet showing much concern for the more mature worries. 

(3) Interests —While manifesting an interest typical of their age 
groups in such activities as baseball, bicycling, and collecting postage 
stamps, this group is alive to a number of mature interests, particularly 
of an intellectual type. In their response to baseball, they are boys 
and girls, while in their response to science they are college men and 
women. 

The finding that upon a number of items this group of children 
responds in the same way as a typical group of college students while 
on other items they respond in the same way as a typical group of 
grade-school children suggests that the maturity which the test 
measures is quite heterogeneous. It seems that certain items are very 
closely related to mental development, while other items have almost 
no such relationship. -Conceivably, there might, in the same way, be 
certain items very closely related to physical development, others very 
closely related to social development. One is led to question the 
meaningfulness of a single score obtained by putting together such 
diverse items. | 

The problem here is similar to that which has arisen concerning the 
unity of the intelligence which we have measured with intelligence 








L 
f 
t 
y 
t 
e 
y 
e 
b 


e 





Responses to the Pressey Interest-Attitude Test 593 


tests. Such work as that of Thurstone would suggest that a single 
score on an intelligence test is only a rather poor first approximation 
to scores on a number of distinct abilities. Our results suggest that 
it may be even more true that an “emotional maturity” score is a 
lump average of rather unrelated items of development. If a score 
or scores on such a test as the one considered here are to be meaning- 
ful, further analysis of the relationships within the test would seem 
to be indicated. We need to know more about which items go 
together, and what external variables each cluster goes with. 


SUMMARY 


(1) A group of forty-nine gifted boys and girls were given the 
Pressey Interest-Attitudes Test. 

(2) Their score on the test as a whole was found to correspond 
more nearly to their mental age than to their chronological age. 

(3) They were most mature in the selection of traits which they 
admired and in their freedom from fears and worries; less so in their 
judgments of wrong and their interests. 

(4) Individual test items showed wide variations in maturity score. 


BIBLIOGRAPHY 


1. Hollingworth, L. 8.: ‘The Founding of Public School 500: Speyer School.” 
Teachers College Record, Vol. xxxvim1, 1936-1937, pp. 119-128. 

2. Pressey, S. L. and Pressey, L. C.: “Development of the Interest-Attitude 
Tests.” J. Appl. Psychol., Vol. xvm, 1933, pp. 1-16. 

3. Thorndike, R. L.: “‘Critical Note on the Pressey Interest-Attitude Test.’’ 
J. Appl. Psychol., Dec., 1938. 








SOME IMPRESSIONS OF THE REVISED 
STANFORD-BINET SCALE! 


MORRIS KRUGMAN 
Chief Psychologist, Bureau of Child Guidance, Board of Education, New York City 


Although the Stanford-Binet was the standard instrument of the 
clinical psychologist for more than twenty years, the hope that it 
would be revised existed almost from the first year the test was used. 
Its weaknesses were obvious: standardization and validation were not 
what they should be; the scale did not go low enough; it did not dis- 
criminate at the upper levels, except for very young children; there 
were gaps that should be filled in; results seemed too high at the lower 
levels and too low at the upper levels; tests were misplaced; other tests 
did not seem to add to the effectiveness of the battery; emphasis on 
verbal material was much too great; too much credit was given for rote 
memory, especially at the upper levels; it was unfair to certain popula- 
tions, etc. For approximately ten years rumors persisted that a 
revision was imminent, and it was expected perennially. After long 
expectation, and after all hope that a revision would ever appear had 
practically vanished, a copy of Terman and Merrill’s book came in the 
mail one morning more than one and a half years ago unannounced. 
At last we had the revision. 

There was immediate impatience to learn the new test and to try it. 
The first reaction to the trial was as strange as it was unanimous, at 
least among the psychologists known to the writer, both in and outside 
the Bureau of Child Guidance. It was one of almost complete dis- 
appointment. How much of this disappointment was due to loyalty 
to the earlier scale, which was, after all, an old friend; how much to 
the inconvenience of struggling with a new instrument; and how much 
to legitimate factors, is difficult to say, but much of it must have had a 
real basis, since, in spite of the fact that the pendulum has swung con- 
siderably the other way, and most psychologists seem to have become 
reconciled to the new scale, it is still not wholeheartedly accepted. 

For this paper, the writer approached the evaluation of Form L 
of the new scale by four different methods: 

(1) Impressions of ten psychologists of the Bureau of Child 
Guidance on eight specific questions were obtained. 


1 Read for the writer by Mr. 8. Goldberg at the meeting of the Association of 


Psychologists of the New York City Public Schools, held at Columbia University, 
January 27, 1939. 





594 


~~ for 
ing 
me 


ne" 


nai 
ani 


of 


apy 





ty, 





Some Impressions of the Revised Stanford-Binet 595 


(2) An examination of one thousand two hundred cases was made 


for the complaint that frequently we cannot be certain, after obtain- 


ing a basal and a final year, that the child has been adequately 
measured. — 

(3) A study of ninety clinic cases, in which both the old and the 
new scales were administered to the same children, was made. 

(4) A study was made of surveys conducted in four schools in 
widely separated areas of the city, surveys in which individual exami- 
nations were administered to entire grades. In these schools the old 
and the revised scales were administered at different times and random 
samplings of their populations were obtained by examining one or all 
of the first three grades. There were one thousand three hundred 
sixty-one children to whom the old Stanford-Binet scale had been 
administered, and four hundred three who were given Form L. 


~ 


PART I 


Considering these four divisions in order of listing, let us return 
to the first part. The psychologists were questioned on such subjects 
as difficulties in administration and scoring, misleading items in the 
scale, items unfair to certain groups, items misplaced, preference as to 
procedure between the old and new scales, and other comparisons 
between the two scales. Although the validity of a subjective study 
may be open to question, it would seem that the opinions of ten 
psychologists who, in the past year and a half have administered 
approximately five thousand Revised Binet L Examinations in the 
aggregate, and who, without consulting each other have come to the 
same conclusions on specific test items, are worth considering seriously- 

There is not time to list all the specific items mentioned by the 
psychologists, so that only those on which there is some measure of 
agreement will be mentioned. As a matter of fact, if all the items 
that were listed only once were recorded, they would sound like an 
enumeration of all the one hundred twenty-two tests in the new 
scale. Only a few can, therefore, be discussed. 

The verbal absurdities were mentioned most often as too difficult 
to score, too often misinterpreted by children, requiring revision of 
directions, influenced by previous tests, affecting tests that follow, 
unfair to certain elements of the population, and having content which 
is too disturbing to clinic children (violence and sadistic material). 
Other items which present difficulties in scoring are the picture 
absurdities (VII, X), the reasons (X), (examples: ‘“‘ Much overlapping 





596 The Journal of Educational Psychology 


of the two reasons—Terman’s description of scoring and examples 
not adequate,” and ‘‘How long should we wait for a second 
response?”’); abstract words (XII, XIV), proverbs (A.A., S.A. II), 
reconciliation of opposites (A.A.), and essential similarities (S.A. I), 
(Sixteen different items were mentioned as presenting difficulties in 
scoring.) 

Items which are frequently misinterpreted by children include 
verbal absurdities (all levels), (largely language difficulty); picture 
interpretation (XII) frequently responded to as an absurdity because 
it follows absurdities; absurdities (VIII) following ‘‘Wet Fall” 
responded to as another story; reasons test (X), problem situation 
(XI), vocabulary (roar, puddle, muzzle), paper cutting (IX). (Eight- 
een different items were listed as frequently misinterpreted.) 

Items considered unfair to New York City children, of which 
twenty-one different ones were named, include verbal absurdities, 
problem situation (XI), problems of fact (XIII), ‘‘Wet Fall” (VIII) 
(title unfair to non-readers, scoring ‘‘In the country” as wrong while 
‘‘On the ranch”’ is correct), opposite analogies (VII)—dependent on 
language, Minkus completion unfair to poor readers; entire Binet not 
standardized for colored children. 

Directions need revision on twenty items, according to our psy- 
chologists. The items most frequently mentioned are verbal absurd- 
ities, paper cutting (IX) (revise so child will not think he has to 
draw whole paper); similarities and differences (VIII) (“‘How long to 
wait for differences or similarities on items c and d?’’—‘‘ How much 
questioning permissible?’”’ etc.) ; bead chains (variable amount of time 
for examiner to complete before presentation). 

There were fourteen items mentioned on which the psychologists 
think success is often accidental. The most frequent were mazes 
(VI), plan of search (XIII), directions (XIV), problems of fact (XIII) 
(boy on bicycle), abstract words (XII), picture (XII). 

There was close agreement on the twenty-five items considered 
misplaced and they are sufficiently interesting to list all of them. 


1. Materials (IV)—Too difficult. 

2. (a) Sentence (V) (a) “Jane wants to build a big castle ... ” 
Word “‘castle’’ too difficult for children at this level. 

(b) Sentence (V) (6) “Tom has lots of fun...” Too difficult—most 
children say ‘‘a lot of.” 

Mazes (VI)—Too easy. 

Picture comparisons (V1)—Too difficult. 

. Picture absurdities (VII)—Rabbit absurdity, disproportionately hard. 


orm o9 





ve! 
for 


old 
oft 


ths 


twe 
wa: 
cen 


oth 





y- 
j- 


ch 
ne 


sts 


[I) 


nost 





Some Impressions of the Revised Stanford-Binet 597 


. Similarities (VII)—parts not of equal value. 

. Opposite analogies (VII)—Too difficult. 

. “Wet Fall” (VIII)—Title too difficult, although test as a whole is easy. 

. Picture absurdity (X)—Too easy. 

Absurdities at XI, more often failed than those at year XII. 

. Sentence at XI, “At the summer camp” easier than sentences at VIII. 

. Picture interpretation (XII)—Too easy. 

. Minkus completion (XII)—Too difficult. 

. Plan of search (XIII)—Too easy. 

. Word memory (XIII)—Too easy. 

. Paper cutting (XIII)—Too easily passed. 

. Bead chain (XIII)—Not appropriate for older children, especially boys. 

. Induction (XIV)—Too easy. 

Ingenuity ( XIV, and A.A.)—Too easy for school children. 

. Abstract words (XIV)—Too easy. 

Reconciliation of opposites (A.A. and S.A. II)—Too difficult. 

. Enclosed boxes (S.A. I)—Too easy. 

. Sentence building (8.A. I)—“civility”’ too difficult. 

24. Vocabulary—needs rearranging 

(Children either stop at year VIII or go far beyond it—too easy at XII 

and all levels below—after word 8, not scaled for N. Y. C. children—at 
upper levels, too much credit for three words, e.g. the difference between 
twenty-five and twenty-six words at 8.A. II is five months.) 

25. Years VIII and XI—Too difficult for children with poor verbal background. 


BRMBER GEESE EES epae 


In general, then, these psychologists believe that there is something 
radically wrong with the content of the Revised Form L—that the 
verbal material is not fair to New York City children, that many 
tests are misplaced in the scale, that some material is not appropriate 
for clinic children, that instructions for administration are frequently 
inadequate, although, in the main, simpler than instructions for the 
old Stanford-Binet, and that criteria and scoring instructions are too 
often inadequate. Furthermore, although there is divided opinion 
on the matter of year-by-year testing, most of the psychologists agree 
that, in a clinic situation, more flexibility of administration is necessary. 


PART 2 


For the second part of this study, a rapid survey of one thousand 
two hundred reports of Revised L examinations in one of our clinics 
was made, and this showed one hundred eighty-one cases (fifteen per 
cent) with either a double basal year, a double final, or both, and, 
in eight cases, three or four basal or final years. Psychologists in 
other units of the Bureau were asked to obtain rough figures for this 
by running through their cases, and all the estimates: ran between 





598 The Journal of Educational Psychology 


fifteen per cent and twenty per cent for the total of more than one 
basal or final year. For five thousand cases examined in the Bureay 
by this scale in one and one-half years, this means between seven 
hundred fifty and one thousand cases in which the examinations 
would not be complete if one stopped at a single basal or final year. 
It must be noted that the number would be even higher if one were 
working under research rather than clinic conditions; in the latter, time 
is an important factor. The writer has no figures for the old Binet 
scale, but it could not have been as high. 

In many cases, the double basal or final year does not materially 
affect the entire score, but in others the difference is considerable. 
Sampling twenty cases of the one hundred eighty-one with double 
final years at random, the IQ’s were recomputed, and the differences 
ranged from one to twelve points, twelve cases showing differences of 
five or less points and three showing differences of more than ten points. 
Seven of the twenty re-evaluated IQ’s changed the classification. The 
greatest differences occurred with young bright children with superior 
verbal ability. 


PART 3 


For the third part, a study of ninety cases in which the old and 
the new scales were administered was made, and this revealed that the 
differences were from zero to twenty-seven points in IQ, in both 
directions. 

Tables 1, 2, and 3 show the extent of these differences. 


Tas.Le I.—CoMPaRisON BETWEEN IQ’s on STANFORD-BINET AND Revisep StTan- 
ForD-Bingt, Form L 











Average 
Number} difference, 
points 
New examination higher than old...:.................... 40 9.0 
New examination lower than old.................00ee005 39 7.3 
NE, 50 S5'5S ee A OO WE ewe 0e is SS waddle poset 11 
oii ii ciikcn dd in a ae oe an a him oti Rn hake 90 7.1 











Classification (e.g. dull to average, average to superior, or ce 
versa) was changed in twenty-five cases (twenty-eight per cent) by 
re-examination with the new scale. In fifteen cases (seventeen per 








Ss ft et 4 


a | 


1 gon @ 1 


RS bt bet be 


—, ~s «- «& © 


wm 4 wear © DD rFekdaJje 





-_- 


— © + 


“ 


e, 





Some Impressions of the Revised Stanford-Binet 599 


cent) the change in classification was significant for practical purposes. 
In practically all of the ninety cases, at least a year elapsed between 
the administration of the old and new scales, and, in some cases, as 
many as five years elapsed. 


Taste I].—Per Cent or Cases with DIFFERENCES or Five Points or Lzss 





N Per cent 





De tinier vaste oaktnbadckdnatateensetewneas 11 12 
SC ccceceds ees estectdedticnecees beckee een 45 50 











TaBLe IJI.—ANatysis or DirFeRENCES 














Point difference N Per cent 
SIRES AR IS mn EI Cay UN i OG I BS 67 74.5 
i. diag Os dulniindhed eb eee) oat eetnewewweeed 15 16.7 
i etn uw cn dele teed cee ede ral beatscedéastctnneet 4 4.4 
Ee PIE PS PUES OLE ee PP ee be been ane 4 4.4 
i i 90 100 
Middle 50 per cent 2.4-11.8; Q = 4.7 











The greatest discrepancies were at the superior levels, where the 
classification was not changed significantly. The four cases with 
changes of more than twenty points (three higher and one lower), 
were all difficult clinic cases in which there was much instability and 
the psychologist knew the child had not been reached by the exami- 
nation, even by the higher results. 

Conclusions for the country at large certainly cannot be drawn 
from these few cases—nor even for New York City—but, in the main, 
children at lower-age levels showed lower results on the new scale, 
while older children showed higher results on re-examination with the 
new scale. Results seemed disproportionately high in cases of 
superior children with good verbal ability, while those with poor verbal 
ability, or from foreign homes, were unduly penalized. Furthermore, 
these ninety cases are clinic cases, and were re-examined not to deter- 
mine the reliability of the test, but because, in each case, the psychol- 
Ogist had a reason for re-examining, usually feeling that the original 
examination, for one reason or another, was not valid. 


Per cent 
“Normal” 


(Terman, 
1916) 





The Journal of Educational Psychology 


0.3 





Total Form L 


i] . ;seonvagRagagee ae 





Number | Per cent 


. * . . 
me _ mk Ow = @ t+ | . 7 
. . 2353288 a e 
. . oe . 7 





Total, Old Binet 


*AMOAtOtTOaaeornrn-: - 


° ;Scoewrggsgeene°c . 4 





Number | Per cent 


TaERggEsaee 





School D 


Re- 
vised 
L, 


~ ° “= -_ . . . 
7 . ees sRe* a . . 





/ *-OmOoeaownw:--: - 





Old and New Scales 


School C 


i wr Ae ony 
6: a> ae 





. . . = @ . . . 
o . - eesRes® . . . 
. . . . . 





* -MOoCoCoOonwm :: > 





ee ee ee ee OS we oy. Bs Sat By TS OU Te 





Tasizs IV.—Disrawvrion or IQ’s, SranrorD-BIneErT 




















wna 2 ee ae, oO”! Un le: US oe. | ee 
ee. ae SR OE ee eee oe ee ee ce ee oo oe 
26 2 Se eS ee 6 ay Be - OT CH Shits. Bw 
2, Be £2 ee Be eo UR ee Be oe ee ke 
er Se ie, od Oe oa eS ee ee ee Oe! Oe, ee ee 
Pete ree A&. O82 © Ciah Be BaD 4,0 
ese ee Se Lee 6 6OC. Oe tes 2a 1}. ea 
i ee el ee ee ee ee we ee ae ee. ae me ae 
we Coan ee we. oe es. eee eS Se ee eee 
on Oe. 2 oe @€ B81 es Se eae a. eee 
ee eee a ae | ee Aa” ee” i a ee  - e 
on ao ae oe fh. & mh OO Be». ae « 
ae fee ok ae Me oe. ee ee ee ee a ee re 
 eeca ns 2 & “@ 6 Bers Ss Sa” Se SR 
ee Te S- 2. 8.6) 8.4 0, 8 : 28, 3 
eine 2S Ge’ SS Bee Ss ALA fa eS 
a. Vass te LO. tay O28 2 2 OTS OUT lee CS 
i” ae a? ee ee eee ey ee ne ee ee oes 
ek Se ee ee ee ee en oe a a 
ee. oh oF oO ee ee PA PS Pe oe rh he 
= 2 BP fe NE ee ol” ae ee ee ee 2 ee 




















Some Inpressions of the Renised Stanford-Binet 601 


PART 4 


For the fourth part of this study, an examination of school surveys 
was made. In some schools in New York City, our clinic has been 
examining individually all the children of the lower grades as part of 
a mental hygiene program. There were four schools in which we had 
random samplings of the school populations (examinations of entire 
grades—I, II, III, or all three) which had been examined by the old 
and the new scales. In these schools, one thousand three hundred 
sixty-one children had been through the old scale, and four hundred 
three had had Form L. The four schools are in different parts of the 
city, and have different types of populations, so that the distributions 
and the statistical measures of the results varied from school to school, 
but one fact stood out in all—the median in each case was lower on the 
new than on the old scale. The median differences were one, two, 
three and seven points, respectively, the order being one of size of 
sampling: The larger the sampling the smaller the median difference. 
For the entire population, the medians were 103.7 and 100.7, respec- 
tively, for the old and the new scales, with the PE’s of the median of 
0.33 and 0.5, respectively, and with a Diff./PEame of 5.0, that is, a 
very reliable difference. For the individual schools this ratio varied 
from one to two, indicating that the median differences between the 
two scales for separate schools were not as reliable as those for the 
entire group. The means of the old and new scales were 103.1 and 
100.8 with a Diff./SDax, of 3.1; the SD’s were 13.7 and 13.4, respec- 
tively. The difference between mean and median for the old scale 
was 0.6 points, and for the new scale 0.1 points. The distribution of 
IQ’s on the new scale approximates Terman’s ‘‘normal” distribution 
much more closely than does that of the old scale. The new scale 
distribution is almost perfect in this respect. That four hundred 
three cases should give so perfect a distribution, is quite startling. 

Care must be taken that any conclusions arrived at from these 
figures be limited to the lower levels of the scale, since they were 
based on the first three grades only. It seems clear that Terman 
achieved his goal of improving the lower years of the Binet exami- 
nation, which had previously given results generally recognized as 
somewhat too high. 


CONCLUSIONS 


In conclusion, the experience at the Bureau of Child Guidance 
with Form L of the Revised Stanford-Binet scale would indicate that 








602 The Journal of Educational Psychology 


it is much superior to the old scale statistically; that it is better 
standardized and better validated, as a whole; that it eliminates suc- 
cessfully many of the objections to the old by including new lower 
levels, extending the upper, and filling in the gaps; that the lower 
levels no longer give results that seem too high, and the upper levels, 
results that seem too low; that the dispute over using thirteen, four- 
teen, fifteen or sixteen for maximum CA for adults has finally been 
settled; that, on the one hand, preschool and very dull kindergarten 
and first-grade children can now be examined without the use of an 
additional preschool battery, and, on the other hand, the superior 
adolescents can be reached; that directions have, in general, been 
simplified; that, in the main, it is a much more refined psychological 
scale, but, apparently, so much attention has been paid to these refine- 
ments in the process of revision, that some of the old weaknesses in 
content were not eliminated, and others have crept in. 

These are some of the weaknesses: 

(1) Longer time required for administration—twenty-five to 
thirty per cent more time, on the average. 

(2) Emphasis on verbal material still present, possibly to a 
greater extent, especially in middle and upper levels. (Admitted 
by Terman.) 

(3) Years VIII and XI seem especially poor in this respect. 

(4) Rote memory still seems emphasized too much at the upper 
levels. 

(5) Possibly because of refinements, there is considerably more 
scatter on the new scale. 

(6) A single basal or final year is not as conclusive as it was on the 
old scale. 

(7) In attempting to simplify directions, confusion to the child 
has resulted on some tests. 

(8) Scoring directions and criteria are sometimes not clear. 

(9) Many tests seem misplaced for New York City children. 

(10) Many situations and many words seem unfair to New York 
City children. 

(11) Much of the content is unsuited to clinic children who show 
emotional disturbances. 

(12) For clinic work, more flexibility in administration would be 
desirable. Since in the year-by-year testing, the easy material is 
presented all at one time, and the difficult material at another, con- 
secutive failure introduces an additional emotional stress. 








> 
J 


\- 





Some Impressions of the Revised Stanford-Binet; 603 


All in all, then, Form L of the new scale seems to be a better con- 
structed instrument, much superior to the old for survey purposes, and 
possibly even for clinical purposes, but with many shortcomings 
which, if corrected, would make it much more satisfactory for clinic 
use. 


BIBLIOGRAPHY 


1. Terman, Lewis M.: The Measurement of Intelligence. Boston: Houghton 
Mifflin, 1916. 

2. Terman, Lewis M. and Merrill, Maud A.: Measuring Intelligence. Boston: 
Houghton Mifflin, 1937. 








THE RELATION BETWEEN FREQUENCY OF TESTING 
AND PROGRESS IN LEARNING PSYCHOLOGY 


C. C. ROSS 
University of Kentucky 
AND 
LYLE K. HENRY 


Iowa State College 


THE PROBLEM 


There is no principle of teaching which has a sounder psychological 
basis than this: To be effective, teaching must always begin where the 
learner’s present knowledge leaves off. Failure to observe this 
principle results in foolish attempts to do two impossible things. One 
of these is attempting to teach a student what he already knows, and 
the other is attempting to teach him on a level too far beyond his 
present knowledge. Both are equally futile. 

The most economical method of determining the status of the 
learner’s knowledge is by some form of test or written examination. 
In the last thirty years these measuring instruments have been greatly 
extended and improved. It appears that so far, however, more atten- 
tion has been given to perfecting the tools than to developing the 
techniques needed for their efficient use. 

What, for example, is the proper balance between teaching and 
testing in a college class, for manifestly the time devoted to the one 
must be taken from the other? Practice varies between the widest 
possible extremes. Some instructors give only a final examination 
while others give some sort of test each time the class meets. What 
relationship exists between the frequency of testing and progress in 
learning? The present study attempts an answer to this question for 
classes in psychology taught by two different instructors at Iowa 
State College. | 


A BRIEF SUMMARY OF RELATED STUDIES 


Several studies have already sought answers to this question. 
Jones! in a pioneer study at Columbia University gave five-minute 
completion tests, euphoniously called “terminal reviews,” at the end 





1 Jones, Harold E.: “Experimental Studies of College Teaching.” Archives 
of Psychology, Vol. uxv111, November, 1923, pp. 36-70. 
604 


Relation between Frequency of Testing and Progress 605 


of each of twenty-seven lectures in general psychology. The groups 
so tested made scores on a final examination eight weeks later that 
averaged approximately twice those of the control group. Jones 
explained this superiority as follows: ‘Examination strengthens con- 
nections. But the later an examination after the original lecture, the 
fewer are the connections which remain to be strengthened.” That 
this argument does not always hold is indicated by a recent study’ at 
the University of Minnesota on the value of ‘weekly tests in psychology 
which gave negative results. 

Turney,? Keys,* and Smeltzer‘ have each conducted studies which 
indicated that better results were obtained in educational psychology 
when tested once a week than when tested less frequently. The 
immediate superiority for the three studies was about twenty, twelve, 
and five per cent, respectively. Keys attempted to check upon the 
later effects also. He found that on an unannounced examination, 
covering the same material given five weeks later, the advantage for 
the group tested weekly had been reduced to seven per cent. Two 
weeks later on the final examination the experimental and control 
groups made practically identical scores. This raises the question 
whether or not any apparent advantage in the weekly tests may not be 
only temporary. 

Noll5 recently reported a study which showed that a class in educa- 
tional psychology which had only the mid-term and final examinations 
was slightly superior to a class of equivalent ability which had in 
addition four written tests lasting from ten minutes toan hour. There 
was evidence that the less able students profited more from the addi- 
tional tests. Kirkpatrick® also found that in twenty-six high-school 


1 Eurich, A. C., Longstaff, H. P., and Wilder, M.: ‘‘The Effects of Weekly Tests 
Upon Achievement in Psychology.” The Effective College Curriculum as Revealed 
by Examinations. Minneapolis: University of Minnesota Press, 1937, pp. 333-347. 

* Turney, Austin H.: “The Effect of Frequent Short Objective Tests upon the 
Achievement of College Students in Educational Psychology.” School and Society, 
Vol. xxx1u1, June 6, 1931, pp. 760-762. 

* Keys, Noel: ‘The Influence on Learning and Retention of Weekly as Opposed 
to Monthly Tests.”” Journal of Educational Psychology, Vol. xxv, September, 
1934, pp. 427-436. 

‘ Pressey, Sidney L.: Psychology and the New Education. New York: Harper & 
Bros., 1933, pp. 363-366. 

5 A paper read at the American Psychological Assn., September 8, 1938. 

* Kirkpatrick, James Earl: ‘‘The Motivating Effect of a Specific Type of 
Testing Program.” University of Iowa Studies in Education, Vol. rx, 1934, pp. 
41-68. 








e 
\ 
' 

































606 The Journal of Educational Psychology 


physics classes the pupils who represented the lowest third in ability Pa 
profited most from tests given two or more times each week. wh 
The evidence, therefore, appears somewhat conflicting. On the (52 
whole it seems to suggest that the immediate effects of frequent testing bet 
are beneficial, especially to the weaker students, but that the delayed act 
effects may be considerably less. act 
Upon one point, however, the evidence is most conclusive. Jones, Pa 
Turney, Keys, and Noll all found that at least seventy per cent of the 
students favored the frequent tests to the infrequent, and felt they mé 
had learned more because of them. Contrary to a popular belief, ps; 
many short tests apparently add to the student’s enjoyment of a ger 
course, if not to his education. a 


THE PROCEDURE 


The experiment was performed in the writers’ classes in psychology 
at Iowa State College during the Winter quarter of 1937. Instructors 


R and H each had two sections of general psychology.' At the first in 
meeting of the classes an objective examination over the entire course wi 
was given to all groups. This test included sixty true-false items, va 
fifty multiple-choice items, and thirty-nine matching items, and had a ch 
reliability by the even and odds technique of .83. Grade-point average gr 
for the previous quarter was obtained for all groups. No attempt was th 
made for strict matching of groups, but the initial differences are ch 
usually slight and are taken into account in the interpretation. Or 

All groups were taught by the informal lecture and discussion mi 
method. All groups received a one-hour test at mid-term and the th 
same two-hour final examination. The only difference in procedure was m 
that for each instructor one group, which we shall designate as the co 
experimental, received in addition a weekly test of thirty objective II 
items over the week’s work. The test was usually given during the we 
last twenty minutes of the last period of the week. The papers were gr 
handed back and discussed briefly at the beginning of the following TI 
period. Nine of these tests were administered during the quarter, gr 
none being given during the week of mid-term and final examination. 

The final examination given to all groups consisted of the retest ge 
plus other material. This additional material consisted of three types re 
of items: Part I, items which actually appeared in the weekly quizzes ex 
(30 points); Part II, items which appeared in the quizzes but which gr 
were recast into different types of objective items (20 points); and Ir 





1 The text was the 1935 revision of Dockeray’s General Psychology. 





Relation between Frequency of Testing and Progress 607 


Part III, identification and interpretation of figures and drawings 
which were in the text but not in any previous quiz or examination 
(52 points). It was expected that the experimental groups would do 
better than the control groups on Parts I and II, since the former had 
actually encountered the questions before. The real criteria of 
accomplishment were presumed to be the retest and Part III, while 
Parts I and II were used for comparative purposes only. 

For the purpose of further checking, the issue involved in this experi- 
ment, Instructor H used a similar plan with two sections of educational 
psychology being taught during the same quarter. Later, sections of 
general psychology, taught by the same instructor and using the same 
methods in both sections, were compared to determine whether 
differences would occur even when the method did not vary. 


THE RESULTS 


The results in general psychology with Instructor R are presented 
in Table I. There were forty-one usable records from each class. It 
will be noted that these two fairly large classes were practically equi- 
valent on grade-point average for the preceding quarter. In fact, the 
chances are only fifty-one out of one hundred that the experimental 
group was superior in this respect. According to the pre-test scores, 
the control group was definitely superior, there being eighty-three 
chances out of one hundred that the difference is not due to chance. 
On the end-test, however, the experimental group was superior, which 
means that the gain made was very much greater for this group. As 
this test had one hundred forty-nine items, the gain for the experi- 
mental group was 53.9 per cent of that possible, while the gain of the 
control group was only 45.5 per cent of that possible. For Parts I and 
II the critical ratios are large and favor the experimental group, as 
would be expected. Part III, likewise, indicates the superiority of the 
group having frequent tests, although the differences are less marked. 
The chances are ninety-five out of one hundred that the experimental 
group on this part of the test is superior to the control group. 

Table II presents the same data for the classes of Instructor H in 
general psychology. There are several differences between the classes 
reported here and those in Table I. Smaller groups were used and the 
experimental group at the outset was somewhat superior both on the 
grade-point average for the preceding semester and on the pre-test. 
Inspection of all scores presented in Table II indicates that the margin 
of superiority of the experimental group had been increased. The 





608 


The Journal of Educational Psychology 


experimental group excelled the control group by an amount .72 times 
its PE on the pre-test and 2.58 times its PE on the end-test. 
represented 58.3 per cent of that possible as against 51.2 for the control 
group. These results for the two small classes in general psychology 
would appear to confirm those for the larger classes. Both instructors 
appear to get better results when the weekly tests are used. 


The gain 


Taste I.—A ComMPARISON OF THE ACHIEVEMENT IN GENERAL PsycHoLocy or 
THE EXPERIMENTAL AND ContTrROoL Groups (Instructor R) 








Chance 
Experi- | Con- | Differ- eure 
Measure sieeted 1 Gal iiaas PEat, | D/PEaitt, | hundred 
of real 
difference 
Grade points 
Su cecschectets 2.05 2.00 .05 ll .04 51 
PECs vie vaueee .74 .69 .05 
Pre-test 
ee 58.07 | 65.78 | 7.71 | 1.84 1.38 83 
TN a a a aie 13.58 | 11.04] 3.54 
End-test 
SS. 5d swat cha 107.07 | 103.67 | 3.40]! 2.07 1.64 87 
CER ce wed eee « 13.66 | 14.11 .45 
Gain 
ES ee 49.00 | 37.88} 11.12| 2.31 4.81 100 
EY ke is ome went 18.43 |} 11.80] 6.63 
Part I, final 
RS ER ES pe 22.29 | 16.02| 6.27 .69 9.08 100 
RON iss SU be aeule 2 4.68 4.54 .14 
Part II, final 
EE a 13.07 10.66 2.41 .47 6.13 100 
RE aap 3.24 3.07 .17 
Part ITI, final 
bekccicsstevest 34.08 | 31.24] 2.84] 1.14 2.49 95 
Be i cin Sa ee ee 7.86 7.38 .48 
Number of students...| 41 41 























1 All differences favor the experimental group except those on the pre-test. 





It is worth noting, however, that this superior gain for the experi- 
mental group was much more marked for weaker students. The ten 
students under Instructor R who were lowest on the pre-test made a 
mean gain of 68.1 points as compared with a mean gain of only 36.1 
points for the ten students who were highest on the pre-test. The 


cor 


low 
me: 


five 
poi: 
Taz 


sho 


ave 





i- 
n 


_ 


1e 





Relation between Frequency of Testing and Progress 609 


corresponding gains for the ten low and high students in the control 
group were 39.3 and 32.8 points, respectively. In like manner the five 
lowest students in the experimental group under Instructor H made a 
mean gain of 60.0 points as compared with a mean gain of only 43.8 
points for the five highest students. The corresponding gains for the 


five low and high students in the control group were 42.0 and 43.2 
points, respectively. 


Taste II.—A ComPaRIsON OF THE ACHIEVEMENT IN GENERAL PsYCHOLOGY OF 
THE EXPERIMENTAL AND ContTroL Groups (INstrucTor H) 








Chance 
Experi- | Con- | Differ- verte 
Measure mental | trol | ence? | F =«ttt. | 2/PEau.| hundred 
of real 
difference 
Grade point 
Ct .éseuecenn'é 2.07 2.01 .06 14 .43 62 
thn 6 aeabeetien .51 .76 .25 
Pre-test 
Ct fits s «den aie 65.00 | 63.00 | 2.00/| 2.76 .72 68 
Oe 14.65 | 11.57 | 3.08 
End-test 
ES se a 114.00 | 107.00 | 7.00); 2.71 2.58 96 
aE »..| 10.71 | 14.45] 3.74 
Gain 
Des 606000 ab ees 49.00 | 44.00| 5.00/| 2.56 1.95 91 
DM Ascéncéesees 10.85 | 18.05 | 2.20 
Part I, final 
EE 6h es teas ane 24.05 | 13.50 | 10.55 | 1.02] 10.34 100 
EE ick oe bade 3.58 5.80 | 2.22 
Part II, final 
AW e bd clcseceos 12.10 | 11.40 .70 .62 1.12 77 
CL. 9s tacanenae 2.78 3.08 .30 
Part III, final 
CR nas trie ens 33.30 | 32.25 | 1.05 | 1.53 .69 68 
Eee cack de @ 6h 0 6 5.67 8.44 | 2.77 
Number of students...| 20 20 























1 All differences favor the experimental group. 


Table III, however, gives a different picture. The data presented 
show the achievement of two classes in educational psychology taught 
by Instructor H. The classes were practically equal in grade-point 
average for the preceding quarter, but the experimental group was 





610 The Journal of Educational Psychology 


superior on the pre-test based on general psychology and on the George 
Washington University teaching aptitude test scores. Contrary to 
expectations, the control group was slightly superior on the final 
examination, which was the only criterion of achievement used. 

It is recognized that this result is somewhat lacking in significance, 
due to the small number of subjects and also to the fact that the 
reliability of the final examination was not known. It does, however, 
again indicate that frequent testing is less effective with higher-ranking 
students. ; 


Tas_eE III.—A CompPaRIsSON OF THE ACHIEVEMENT IN EDUCATIONAL PsycHOLogy 
OF THE EXPERIMENTAL AND ContTROL Groups (INstrucTorR H) 











Chance 
Experi- | Con- | Differ- ashes 
pene mental | trol | ence! PEue. | D/PEun, | hundred 
of real 
difference 
Grade point 
RS EER 2.23 2.29 06; 1.21 .04 51 
RS EER ae .64 .48 16 
Pre-test 
RE Ee Be: 96.50} 91.35) 5.15 | 3.24 1.58 86 
Alla Sh ate ita e tainted 18.52 | 10.90 | 7.62 
Teaching aptitude ‘ 
SE RIS Ae 150.85 | 147.30 | 3.55 | 3.17 1.12 77 
fide ss Wiens 09s 'w'da 15.96 | 13.66 | 2.30 
Final 
el on ae hin gece 100.30 | 105.35 | 5.05 | 2.12 2.37 95 
ath wba ilk wins std 11.34 8.36 | 2.98 
Number of students...| 20 20 




















In view of this situation it was decided to teach sections in general 
psychology by the same methods, and compare results.!_ This was 
done during the following Fall quarter of 1937 by Instructor H. There 
were two pairs of classes whose grade-point averages for the preceding 
quarter were the same.! A modified teaching procedure was used in 
which, instead of weekly tests, three short quizzes were given all 
students in addition to the mid-term and final examination. The basis 
of comparison was the means of the sums of all tests given in the 
course. Between two of the classes the difference in mean scores was 
7.37 + 5.52, and between the other two classes the difference was 


1 The text was Dashiell’s Fundamentals of Objective Psychology. 








the 


1g 
ent 


of 
cla 


sin 
mo 





al 


oe & 


ng 
in 
all 





Relation between Frequency of Testing and Progress 611 


5.07 + 7.54. Although neither difference is statistically significant, 
the chances are eighty-two in one hundred that the first one is not due 
to chance. It is apparent, therefore, that differences approaching 
significance are likely to occur in the achievement of classes of appar- 
ently equal potentialities, even when taught by the same method. 


A BRIEF SUMMARY AND CONCLUSION 


This study attempts to determine the relation between frequency 
of testing and progress in learning psychology. The achievement of 
classes in general and in educational psychology at Iowa State College 
which were given a short test each week is compared with that of 
similar classes which had only the mid-quarter and final tests. The 
most important findings are as follows: 

(1) In general psychology the mean achievement of the two experi- 
mental groups is significantly greater than that of the two control 
groups. This superiority, however, is most marked for the pupils 
whose pre-test scores were in the lowest fourth of their classes. 

(2) In educational psychology the achievement of the control 
group is slightly greater than that of the experimental group. 

The results of this study, when considered with those of similar 
studies already reported, strongly suggest that there is no one best 
testing technique which is equally effective under all conditions any 
more than there is one best teaching procedure which is universally 
superior to all competitors. Apparently, methods to be most success- 
ful must vary not only with the nature of the subject taught but also 
with the ability of the student. The data presented are added testi- 
mony to the difficulty and complexity of the teacher’s task. 








EFFECT OF THE HIGH SCHOOL ON COLLEGE GRADES 
PAUL L. DRESSEL 
Michigan State College, East Lansing, Mich. 


The statistical procedures implied by the phrases Analysis of 
Variance and Analysis of Covariance have been much used in recent 
years in agricultural research, but, for one reason or another, have been 
neglected almost entirely by investigators of educational problems. 
In his book, Analysis of Variance and Covariance, G. W. Snedecor 
treats (page 72) a problem which suggested most of the procedure 
followed in this article. Reference should be made also to Statistical 
Methods for Research Workers by R. A. Fisher, since many of the 
procedures and most of the tests of significance used are completely 
discussed in: the latter book. 

The particular problem studied here had its inception in a question 
raised by L. C. Emmons, Dean of the Division of Liberal Arts at 
Michigan State College. The question was: “Is it possible to rank 
high schools on the basis of the college success of their graduates, and 
is this ranking stable enough to be used in predicting the college grades 
of high-school graduates?” After some discussion it was decided to 
make a preliminary study and, from the results found, decide whether 
a more extensive study would be worth while. The results of 
this preliminary study are here presented. 

In beginning the study it was immediately obvious that only those 
high schools that contribute regularly a group of freshmen to Michigan 
State College could be studied. Moreover, in order to obtain a suff- 
cient number of students from each high school, it was necessary to 
consider more than one year. Since too long a period might introduce 
difficulties, a four-year period, 1934-1937, inclusive, was taken. 
Fifteen high schools were selected, largely on the basis of the number 
of graduates enrolling at M.S.C. With but one exception, these were 
large city high schools. The schools will be designated by a letter 
rather than by name. | 

The numbers of students from the different schools varied from 
twenty-two to one hundred twelve, with a total of eight hundred ten 
students. The high-school record of each of these students was 
obtained and a point average was computed from this in the same 
fashion that a point average is computed at M. 8. C., that is: One hour 
of A gives three points; B, two points; C, one point; D, no points; F, 
negative one point. For computing the high-school average, one 
612 





i | Vw oc. ‘*” ~~ = 


ed 


_— 
— 


se 
wn 


tO 
ce 


er 
er 
ym 
en 
/as 
me 


yur 


ne 





Effect of the High School on College Grades 613 


semester was substituted for one hour. The college average of each 
student for the Fall term of his freshman year was also obtained. 


TaBLeE I.—ANALYsIS OF COVARIANCE 











Sum of squares and Mean Regression 
Degrees products squares | Correla- | of college 
Source of | of free- tion coeffi-| grades on 
variation | dom a | clgecl #| ¢ | Ment [high-school 
; grades 
Wb csnedes 809 |302.41/512.30) 205.36) .373) .633 .5217 .6791 
Among means 
of high 
schools... . 14 | 28.81) 19.51|—11.38)2.058)1.394; — .4800 — .3950 
Within high 
school..... 795 |273.60\492.79) 216.74 .344| .620 .5903 . 7922 





























The data for these students were analyzed by the analysis of 
covariance methods, and tests of significance were applied. Table I 
shows the results of this analysis applied to high school point averages 
(H) the first term college point averages (C). Denoting the mean high 
school point average by A and the mean college point average by C, 


the total sums of squares of the first row are given, respectively, by 
810 810 


> (Hi — H)? and > (Cs — C)? while the total sum of products is 
i=l i=1 
810 


given by > (HC; — HC). The mean squares are obtained by dividing 
i=l 
the sum of squares by the number of degrees of freedom. For com- 
puting the sum of squares among means of high schools the mean high 
school point average and the mean college point average for all 
students from each high school are used. Denote the mean high 
school point average for the “‘7”th high school by A; and the mean 
college point average for the “7”th high school by C,;. The sums of 
squares among means of high schools are given, respectively, by 


> (A; — A)? and > (C; — C)?, while the sum of products is given by 


i=] t=] 


15 
> (AC; — HC). The mean squares are again obtained by dividing 


i=] ° 


the sum of squares by the degrees of freedom. The sums of squares 





614 The Journal of Educational Psychology 


and products within high schools are obtained by subtracting the sums 
of squares and products among means from the corresponding total] 
sums. Correlations and regressions are computed in the usual way, 
In the total mean square no consideration is given to high schools, 
In the mean square among means of high schools, only the variation in 
the mean point averages is considered. The mean square within 
high schools is an average of the variances of point averages for each 
of the fifteen high schools. For further details as to the computations 
involved, the reader is referred to Part IV of Analysis of Variance and 
Covariance. 

Probably the outstanding feature of Table I is the large mean 
square among means of high schools as compared with the mean 
square within high schools. This is true both for the high-school 
grades and the college grades. For college grades the ratio of the mean 
squares is 1.39/.620 = 2.25 and for high-school grades the ratio is 
2.06/.344 = 5.98. A ratio of 2.20, according to the tables on page 88 
of Analysis of Variance and Covariance, is highly significant. From 
this we may conclude that groups of students from different high 
schools differ significantly both in their high-school and college grades. 
It should be noted that the mean square for high-school grades among 
means of high schools is considerably larger than the mean square for 
college grades among means of high schools. This indicates that, from 
the college viewpoint, there was not nearly so much difference between 
the groups from different schools as the high-school grades would 
indicate. This circumstance is more surprising when we recall that 
students entering college ordinarily have at least a C average in high 
school, and thus we should expect the mean square for high-school 
grades to be less than that for college grades. For mean squares 
within high schools the latter relationship does hold. 

The correlation between high-school grades and college grades is 
.52. The correlation between high-school grades and college grades 
within high schools, which is the appropriate average of fifteen correla- 
tions, one for each high school, is .59. The correlation between mean 
point averages for high schools, —.48, is quite interesting. This 
correlation, although not computed in precisely this fashion, may be 
obtained by correlating the mean high-school average and the mean 
college average for the fifteen schools. These figures may be found in 
columns 2 and 3 of Table III. 

A better appreciation of this negative correlation can be obtained 
from Table II. The high schools are first arranged in order according 





"rer « & @ 


a @ 


at 


res 


| is 
Jes 
la- 
an 
his 

be 
an 
| in 


ned 
ing 





Effect of the High School on College Grades 615 


to the mean high-school average, highest being first. In the second 
column is given the rank of the high school based on the mean college 
average. The two highest schools, J and M, dropped to ten and 
fourteen, respectively. The school which ranked eleven on high 
school average was ranked one on college average. Reference to 
table VB in the back of Statistical Methods for Research Workers shows 
that for fourteen degrees of freedom, a correlation of —.48 would occur 
by chance only about six times out of a hundred. This falls just short 
of the commonly accepted .05 level, so further investigation on this 
point would be helpful. 








Tasie II 
School Rank on basis of Rank on basis of fall term 
high school average average at M. 8, C. 

I 1 10 
M 2 14 
N 3 4 
Bm ;~ 4 2 
G 5 8 
J 6 3 
K 7 G 
F s 6 
B y 15 
E 10 13 
H 11 1 
O 12 12 
D 13 11 
A 14 7 
Cc 15 5 











The regression coefficients are also given in Table I. That within 
high schools is the correct one to use for prediction, since it allows for 
the variation between high schools. The standard error of estimate 
associated with it is .40. Obviously, accurate prediction is not 
possible even when allowance is made for the difference between high 
schools. 

Table III exhibits various statistics for individual schools for 
comparison with results already given. Column 4 gives the estimated 
mean college average and column 5 shows the error (difference between 
actual and estimate). Columns 6 and 7 give the individual corre- 
lations and regressions between high-school and college grades. 
Columns 8 and 9 give the standard deviations of high-school and of 








616 The Journal of Educational Psychology 


college grades, respectively. Along the bottom are found the same 
statistics for the fifteen high schools combined. 


Tasie III.—Sratistics on InprtvipvaL Hieu ScHoois 




















1 2 3 4 5 6 7 8 9 
Esti- 
Mean | Mean 
Num- high | col- | ™ted Re- | efor | ¢ for 
High school | ber of col- | Error |Corre- : 
en leges |(3)-(4)|lations| 8° high | col- 
dents | 2V@™ | ®Ver- | vor sions |school| lege 
age | 98° | age 
A 22 | 1.53 | 1.25 | 1.10 .15) .47 52 | .67 | .74 
B 25 | 1.63 | .96) 1.18 | —.22| .65 68 | .57)| .60 
C 48 | 1,45 | 1.26 | 1.03 .23| .66| .74| .50] .58 
D 67 | 1.54 | 1.16 | 1.10 .06; .49 | 1.04 .47 | 1.01 
E 27 | 1.57} .98| 1.138 | —.15] .41] .47] .57 | .64 
F 60 | 1.66 | 1.25 | 1.20 .05} .87 | 1.00] .63]| .72 
G 42 | 1.68 | 1.20 | 1.21 | —.01| .64 66 .58/ .61 
H 98 | 1.55 | 1.48 | 1.11 87] .77 | .72| .69 1] .55 
I 40 | 2.17 | 1.16 | 1.60 | —.44, .55| .74| .64| .87 
J 47 | 1.66 | 1.30 | 1.20 .10;} .64| .63/| .66)| .64 
K 44 | 1.66 | 1.20 | 1.20 .00|} .62| .84/] .50/ .68 
L 112 | 1.69 | 1.36 | 1.22 14) .62| .70| .64] .73 
M 93 | 2.05| .96/ 1.51 | —.55) .62/ 1.48] .51 | 1.21 
N 54 | 1.76 | 1.29 | 1.28 Oi; .44| .45/] .60| .60 
O 31 | 1.55 | 1.15 | 1.11 .04; .60 71 .57 | .90 
OMB. 5 00 0 « res 810 | 1.69 | 1.23 | 1.23 .00; .59 .79 .59 | .79 























To obtain column 4 of Table III, the regression equation, 
C — 1.23 = .7922(H — 1.69), was used. In this equation C repre- 
sents the estimated college point average and A represents the high- 
school point average. This equation was formed from the regression 
coefficient within high schools and the mean high-school and college 
averages found at the bottom of columns 2 and 3. Column 5 is the 
difference between 3 and 4 and shows the error involved in predicting 
the mean college average from the mean high-school averages. Two 
schools, J and M, are significantly lower than expected and two others, 
C and H, are significantly higher. This column seems to give the best 
clue on ranking the high schools, but it is not satisfactory, since there 
appears to be no great relationship between it and columns 6, 7, 8, and 
9, which also supply worth-while information. 








Effect of the High School on College Grades 617 


The correlations given in column 6 vary from .41 to .87. Regres- 
sion coefficients in column 7 vary from .45 to 1.48. The last two 
columns give a comparison of standard deviations of high-school and 
college grades and show that usually variation in college grades is 
greater than that of high-school grades. This is to be expected, and so 
it is rather unexpected to find three schools—H, J, and N—which 
deviate from the rule. 

The results found here emphasize the existence of differences 
among high schools and indicate that predictions of college grades 
could be somewhat improved by knowledge of these differences. 
However, it seems at present that the improvement would scarcely 
justify the extra effort involved. 








THE USE OF THE KUHLMANN-ANDERSON 
INTELLIGENCE TESTS IN PRIVATE SCHOOLS 


GEORGE SPACHE 
Friends Seminary, New York City 


The Kuhlmann-Anderson Intelligence tests were first suggested in 
the testing program in private schools sponsored by the Educational 
Records Bureau* in the Fall of 1931. The tests were used again in the 
Fall of 1932 and 1933 and each Fall since 1936. As a result of their 
compilation of the results from about thirty schools each year, the 
Educational Records Bureau made several suggestions for the use of 
the tests. They suggested: 


(1) That the first grade, second semester battery be used in testing in the 
first grade in the Fall programs in place of the first grade, first semester battery 
since the latter appeared too easy for private school pupils.! 

(2) That the batteries for the first and second grades be enlarged to pro- 
vide opportunity for the more able pupils to score at appropriately higher 
levels. Tests 3-17 were to be used in the first grade and 8-21 in the second 
grade.? | 

(3) That pupils in grades above the second be given the batteries intended 
for the next higher grades.? 


From the results of the 1936 Fall testing program, the ERB con- 
cluded that: 


the results of schools failing to administer tests 3 through 17 of the special 
first grade battery were not satisfactorily comparable with those of schools 
that did administer all the tests as recommended... . In all schools for 
which comparisons were made it was discovered that the mental ages and 
1Q’s of first-graders were significantly raised by administering the tests beyond 
number 10.? 


These suggestions were adopted by the one hundred or more 
schools entering into the annual Fall testing programs sponsored by 
the ERB. We are concerned here with determining whether these 
suggestions are supported by the data available from the testing of two 
representative private schools. Since the testing of the intelligence of 
entering first-graders is more difficult than testing at any other school 
level, any effort to aid in the technical details of choice of tests should 
prove valuable. 


* The Educational Records Bureau, 437 West 59th Street, New York City. 
618 








de 


Kuhlmann-Anderson Tests in Private Schools 





619 


The representative nature of the two private schools from which 
data were obtained may be deduced from Table I. 


TaBLE I.—INTELLIGENCE QUOTIENTS FROM THE KUHLMANN-ANDERSON TESTS 
































, , Diff- | .. 
Source Grade| N | Median| Sigma Foran Sigmaar.| CR 
Present study........ a I 94 108.99 | 9.7 
5.79 1.51 | 3.83 
Pe dekaech Ces ensas I | 205 | 114.78 | 9.88* 
3.41 1.51 2.25 
ttn ndeh esas saat I | 225 | 112.4 | 10.37* 
4.01 1.36 | 2.94 
iin hk e ad oee on I | 497 | 113.0 9.71* 
Present study........ II 96 | 110.8 7.9 
4.52 1.30 | 3.47 
Ja II 215 | 115.32; 9.94* 
1.9 1.24 1.53 
SS Ope eee II 284 | 112.7 10.08* 
3.1 1.12 | 2.76 
De iistwaehesqanwe II | 587 113.9 | 10.23* 
* Computed from author’s data. 


It appears that our population is significantly lower in median IQ 
than some samples of private school children, although the median is 
probably higher than that of the general population. Later conclu- 
sions may be influenced by the fact that the sample is, on the average, 
slightly lower in IQ than the group of private schools entering into the 
ERB testing programs. 

In October 1937 and 1938, the Kuhlmann-Anderson tests were 
administered to the children of Friends Seminary and Brooklyn Friends 
School in the special batteries suggested by the ERB. The results 
of this testing have been analyzed to determine the influence of the use 
of the special batteries. 

The nature of the Kuhlmann-Anderson tests permits the subject to 
secure zero Or maximum (or intermediate) scores on the various tests 
comprising each battery. Zero scores do not influence the median 
mental age derived from the test battery since such scores are dis- 
carded. Maximum scores are used in computing the median mental 
age and, hence, tend to lower the IQ, since they do not permit a subject 
to score at a level indicating his true ability. The per cent of zero 
and maximum scores achieved by our cases are given in Table II. 





_ 


0107 


oo 


0107 


0107 


0107 


0197 


0107 


0187 








TTOT 6/8 Zi9/SiF/e e1008 N 


§ 
3 
£ 
: 
: 
> 
: 
3 
S 
= 


SISH], NOSUMGNY-NNVWIHOY GHL NO SH¥00g WOWIXV GNV OUNZ 40 IND UG—]] @IAVI, 








Kuhlmann-Anderson Tests in Private Schools 621 


Inspection of the table reveals that a number of the tests do not 
function efficiently in this population. In the first grade, test 14, 
which tests the ability to complete a line of figures arranged in a 
repetitious pattern, is failed by fifty-three per cent. This excessive 
failure may be accounted for by several hypotheses; namely, (1) the 
results may be an accident of sampling, (2) the children tested may be 
inferior to the general population upon whom this test was standard- 
ized, (3) the mental age norms for the test may be too low and, hence, 
(4) the test may be placed too early in thearrangement of tests. Inall 
probability, the test is not too difficult for the slightly superior private- 
school population. 

Tests 3 and 10 yield maximum scores of 6-7 and 7-9 in sixty and 
fifty-two per cent of the cases, respectively. The tests are measures of 
the abilities to mark specified objects and to mark an object illustrating 
the end of an incomplete story. If these tests are too easy for the 
present cases, they are very likely to prove too easy for the private- 
school samples. Test 3 might well be retained as an initial test since 
it would function as a shock absorber for the battery. But test 10 is of 
questionable discriminatory value. In our opinion, when maximum 
scores are achieved in either of these tests, the results should be 
discarded. For the thirty-nine per cent of pupils who achieve some 
score between zero and maximum, the test score might well be used. 

In the second grade, test 21, a measure of the ability to read and 
mark opposites, is too difficult for this population. The same may not 
be true in other private schools. Tests 8, 10, and 11 are definitely too 
easy to have discriminatory power at this level. Test 8 is a counting 
of pictured objects, 10 is the completion of a story, and 11, a measure 
of discrimination between objects with similar attributes. These tests 
are, in all probability, too simple for the private-school samples. If 
the use of 8 is discontinued, 9 would readily function as an easy intro- 
ductory test. Tests 10 and 11 should certainly be discarded. 

In grade III, test 15, a simple reasoning test requiring the marking 
of objects of similar nature, may be retained as an initial test despite 
the probability that it is too simple for private-school children. Only 
the scores of the forty-six per cent who make scores somewhere between 
zero and the maximum should be recorded. Above the third grade, 
the use of the batteries intended for the grade above has apparently 
eliminated the tendency to excessive zero or maximum scores. The 


batteries appear to possess adequate discriminatory power in these 
grades. 








622 The Journal of Educational Psychology 


The conclusions concerning the lack of discriminatory value of 
certain tests in the special batteries for the first and second grades are 
supported by the evidence obtained from the correlations between the 
1Q’s derived from the special or long batteries and those from the 
regular or short batteries. In the first grade the raw correlation 
between the intelligence quotients from the long and short batteries 
for ninety-four cases was +.938 + .001. In the second grade, the raw 
correlation for ninety-six cases was +.928 + .009. The following 
facts were derived from these correlations. 


TaBLe III.—CorRELATIONS BETWEEN REGULAR AND SpEcIAL BATTERIES OF THE 
KUHLMANN-ANDERSON TESTS 








Battery | Grade| N | Median | Mean| Sigma sj mike Sigmaan,| CR 
Reguler....... 1 | 94| 108.46 | 108.7] 9.5 
30 | .34 | .88 
Enlarged...... I | 94] 108.99 | 109.0) 9.7 
Resuler....... 11 | 96 | 109.99 | 109.3| 7.8 
1.20 | .30 | 4.00 
Enlerged...... i | 96| 110.8 | 110.5! 7.9 





























In the first grade, the chances of a real difference greater than zero 
are only eighty-one hundred six in ten thousand, an unreliable differ- 
ence. The median difference between the IQ’s on the two arrange- 
ments of the tests was 1.89 points. The range of difference was from 
nine points higher to seven points lower on the enlarged battery. It 
would appear that the use of an enlarged battery has not resulted in a 
reliable increase in the mean IQ as the experience of the Educational 
Records Bureau led them to presume. Of the four tests added to the 
regular battery, it has been shown that test 3 was too simple and, hence, 
would not tend to raise the median, and that test 14 was too difficult. 
The excessive number of zero scores on this latter test nullified the fact 
that it gave opportunity for superior pupils to score at higher levels. * 

In the second grade, the chances of a real difference greater than 
zero are one hundred in one hundred, a wholly reliable difference. The 
median difference between the IQ’s was 1.82 points. The range of 
differences was from five points lower to fourteen points higher on the 
enlarged battery. Despite the close correlation between the IQ’s, 





* These results confirm the similar tentative conclusions offered in an earlier 
article by the writer.® 
| 


th 
av’ 


wi 
in 


s2 G4e& 


gc. > 





Kuhlmann-Anderson Tests in Private Schools 623 


those obtained from the enlarged battery were reliably higher, on the 
average, than those from the shorter, regular battery. 

If, in accordance with the earlier suggestions, tests 8, 10 and 11, in 
which excessive maximum scores occur, were discarded, this difference 
in favor of the enlarged battery would undoubtedly be increased. 

On the basis of the evidence introduced herein, certain conclusions 
may be offered: 


(1) The use of an enlarged battery of the Kuhlmann-Anderson tests in the 
first grade is unwarranted in private-school testing. 

(a) Because of the lack of reliable difference between the mean IQ’s 
obtained from the two arrangements of the tests. 

(6) Because of the tests added, one is too easy, one too difficult, and 
only two function with discrimination. 

(2) Maximum scores on test 10 should be discarded because of the lack of 
discriminatory power of this test in private-school testing in the first two 
grades. 

(3) The use of an enlarged battery of the Kuhlmann-Anderson tests, con- 
sisting of tests 9 and 12 through 20, is warranted in private-school testing in 
the sec d grade. 

(a) Because of the demonstrated rise in mean IQ by use of the addi- 
tional tests. 
(6) Because these tests possess adequate discriminatory power. 

(4) In the use of an enlarged battery in grade II, tests 8, 10 and 11 should 
be discarded since they lack discriminatory power in private-school testing. 

(5) The use of the Kuhlmann-Anderson battery intended for the grade 
above is warranted in private-school testing in grades III—-VIII because of the 
elimination of excessive zero and maximum scores. 


BIBLIOGRAPHY 


1. “1933 Fall Testing Program in Independent Schools.”’ Educational Records 
Bulletin, No. 12. Educational Records Bureau, New York City. 

2. “1936 Fall Testing Program in Independent Schools and Supplementary 
Studies.”” Educational Records Bulletin, No. 19, Educational Records Bureau. 

3. “1937 Fall Testing Program in Independent Schools.” Educational Records 
Bulletin, No. 22. Educational Records Bureau. 

4. “1938 Fall Testing Program in Independent Schools and Supplementary 
Studies.” Educational Records Bulletin, No. 26. Educational Records 
Bureau. 

5. Spache, George: ‘‘Mental Tests in the First Grade.” Elementary School 
Journal, Vol. xxxrx, December, 1938, 289-297. 











THE EFFECT OF METHOD OF RESPONSE UPON THE 
VALIDITY OF MULTIPLE-CHOICE TESTS 


DAVID F. VOTAW 
Southwest Texas Teachers College 
AND 


LILY DANFORTH 
Liberty Hill, Texas, High School 


Recent increased use of mechanical methods of responding to 
objective test items in order to permit easy and rapid scoring has 
raised questions about the effect such methods of responding may have 
upon the validity of scores. This investigation, preliminary in nature, 
attempts to detect differences in multiple-choice test score results 
which arise from different methods of responding to the items. More 
specifically, the question is: Will a multiple-choice test measure to 
some extent traits other than those which it purports to measure if a 
mechanical method of responding is required of pupils? 

Three methods of responding to a test of fifty items were considered, 
one method being defined as a natural response and two being defined 
as mechanical responses: 


Method A—(mechanical response) key number of selected answer to be 
written in a marginal space. 

Method B—(natural response) selected answer to be underlined. 

Method C—(mechanical response) check mark to be placed in an appropri- 
ate square or space on a separate answer sheet. (For example, a choice of the 
third answer would be indicated by checking the third square on the answer 
sheet.) 


Fifty-nine pupils of Grades VII, IX, and XI were used as subjects. 
These pupils were first given a mechanical-transfer ability test of fifty 
items which was prepared by the writers. Each pupil was provided 
with an answer sheet (the same one used later in test method C) and a 
sheet of fifty directions reading as follows: 


On your answer sheet place the figures or letters in the squares as directed: 
1. Place 2 in the third square of the fourth row of squares. 
2. Place 4 in the fifth square of the ninth row of squares. 
and so on to 
50. Place ¢ in the fourth square of the sixth row of squares. 


The time required for each pupil to complete the test was recorded. 
After papers were scored for accuracy a mechanical-transfer index 
score (MTI score) was determined for each pupil (the product of the 
624 


oes 5 eee 


co 
+ 





re —_— ™™s 





Method of Response and Validity of Multiple-choice Tests 625 


individual raw score by the quotient of the group average time divided 
by the individual time). The range of MTI scores was approximately 
one hundred fifty, the mean being about ninety and the standard 
deviation being about twenty-seven. There was a positive, though 
low, correlation of these scores with intelligence quotients, the coeffi- 
cient being +.59 + .06. 

After arranging the fifty-nine pupils in order of MTI scores the 
highest sixteen were designated high in mechanical transfer ability 
and the lowest sixteen were designated low in mechanical transfer 
ability. ‘The purpose of these classifications was to compare the two 
groups later with respect to score losses resulting from the requirement 
to respond mechanically to test items of an ordinary multiple-choice 
achievement test. 


TaBLE I.—Comparison oF Gains IN Raw Scores Mape sy Poupiis or Low 
MECHANICAL ABILITY WITH SimILar GAINs BY Pupiits or High MECHANICAL 
ABILITY WHEN PERMITTED TO INDICATE ANSWERS BY NATURAL RESPONSE 
INSTEAD OF BY MECHANICAL RESPONSE 




















Prob- 
Classification ability 
Comparison of pe | M | SD |PEw'| Difu| PEp oie = 
ability due to 
chance 
Difference between/Low........ 9.75/8.24| 1.22 
scores on A (me- 9.39) 1.41) 6.66) .001— 
chanical response) | High........ .36)4.81) .71 
and B (natural re- 
sponse). 
Difference between /Low........ 2.2517 .46 1.10) 
scores on B (natural 2.13) 1.48) 1.44) .166 
response) and C (me-| High........ .12}6.69} .99 
chanical response). 


























1In computing probable errors of means for this table and also for Table IT 
the high and low groups are treated as samples from a limited population. 


All of the pupils had been drilled previously on the general subject- 
matter of the achievement test. The achievement test was adminis- 
tered three times to all pupils with a different method of response each 
time, the order of methods of response being varied in all possible ways. 
The reliability (Spearman-Brown formula applied to the correlation 
coefficients of odds and evens) of the test did not vary significantly 
from one method of response to another. For method A the reliability 








626 The Journal of Educational Psychology 


was +.89 + .02; for method B it was +.86 + .02; and for method C 
it was +.89 + .02. 

To carry out the original purpose of the study each pupil’s method 
A (mechanical response) raw score was subtracted from his method B 
(natural response) raw score, and also each pupil’s method C (mechani- 
cal response) raw score was subtracted from his method B raw score. 
Comparisons of these differences between the two groups previously 
found to be high in mechanical-transfer ability and low in mechanical- 
transfer ability are shown in Table I. 

In the administration of the achievement tests although each pupil 
was allowed sufficient time to try all items his time was recorded and a 
“time index score” was determined for him on each of the three 
methods of response. The steps taken to secure data for Table I on 
raw scores were repeated for the ‘‘time index scores.”’ The results 
are assembled in Table II. 

Taste II.—Comparison or Gains In Time INpEx Scores Maps By Poptis or 
Low MEcHANICAL ABILITY WITH Sim1LarR GAINs BY Pupiis oF Hicgh Mecuan- 


1cAL ABILITY WHEN PERMITTED TO INDICATE ANSWERS BY NATURAL 
REsPONSE INSTEAD OF BY MECHANICAL RESPONSE , 

















Prob- 
Classification 7” ability 
Comparison of pea won oy M | 8D | PEw|Difu| PE> | = — 
ability due to 
chance 
Difference between | Low........ 16.48)8.87| 1.31 
scores on A (me- 3.55) 1.35) 2.63) .038 
chanical response) | High........ 2.93/2.05) .30) 
and B (natural re- 
sponse). 
Difference between | Low........ 3.86|3.30| .49| 
scores on B (natural 2.57| .54| 4.76) .001 
response) and C (me-| High........ 1.29)1.57| .23 
chanical reponse). 





























The average time required for pupils to take the achievement tests 
by use of response A was almost exactly the same as the average time 
required by use of response C. However, this average time for 
mechanical responses was about eighteen per cent longer than the time 
required for natural responses. 
From Table I, which deals with raw scores uninfluenced by varia- 
tions in time of individual pupils, it may be seen that a significant loss 





ty 
sci 
fre 
til 


Ses ecse sat 





a -_ ol al | eed 





Method of Response and Validity of Multiple-choice Tests 627 


is suffered by the group low in mechanical ability when a mechanical 
type of response such as method A is imposed. While the loss in raw 
scores by the low group when a type of response such as method C is 
imposed may be explained more or less readily as a chance happening 
from data of Table I, these losses become highly significant when 
time spent in taking the test is injected into the comparison as shown 
in Table II. ' 

In answer to a question about which method of response was 
preferred, sixty-five per cent of the pupils indicated method B (natural 
response), eleven per cent indicated method A, and eleven per cent 
indicated method C. (Thirteen per cent failed to answer.) Most of 
the pupils who indicated a preference for either of the mechanical 
methods of response were found to rank high in mechanical transfer 
ability. The novelty of the mechanical methods might attract the 
interests of pupils of this type. 

The ability to improve scores by the use of the natural response 
instead of a mechanical response was studied for each grade (VII, IX, 
and XI) separately. Although no significant differences were found, 
there was suggestive evidence to support the statement that the lower 
the grade the greater the loss in validity from mechanical methods of 
response. 


IMPLICATIONS 


(1) The validity of a multiple-choice achievement test is reduced 
when pupils of grades VII to XI are required to respond to the test 
items by mechanical methods. Apparently the resulting scores 
represent in part an ability in mechanical-transfer as well as an ability 
in the subject-matter of the test. 

(2) Loss in validity appears to be less when responses are made by 
a check mark in appropriate squares or spaces on a separate answer 
sheet than when made by writing the key numbers of answers in 
marginal spaces on the test sheet. The separate answer sheet provides 
the type of response usually required for use in scoring by mechanical 
scoring devices (punch boards, electrical machines, and so on). 
Liberal time allowances with this type of response may reduce validity 
losses to a point of little or no importance for grades VII to XI. 

(3) Pupils generally prefer to respond to multiple-choice test items 
by the natural method of marking or underlining their choice of 
answer: 

(4) Necessary time for administering a multiple-choice test is 
reduced when natural methods of responding are permitted. 


ie 
\ 
pt 





NOTE ON THE 
MULTIPLE TRUE-FALSE TEST EXERCISE 


LEE J. CRONBACH 
University of Chicago 


The multiple true-false examination, potentially one of the most 
useful of objective test types, has hitherto remained far less popular 
than its merits deserve. The form is not new, having been described 
many times under such names as ‘“‘ multiple choice, plural response,’’! 
‘plural multiple-answer,”’? and “plural choice.’’® 

In the customary form of the item, the student is instructed either 
to select (a) ‘‘all the correct responses,” or (b) “two (e.g.) correct 
responses’”’ from each group of alternatives. 

Examples of the item in its usual form are: 


America imports most of her 

(a) cotton (b) lumber (c) oil (d) rubber (e) tin (f) wheat. 

The Supreme Court has the power to 

(a) Remove the President from office. 

(b) Decide cases involving ambassadors from foreign countries. 

(c) Declare acts of Congress unconstitutional. 

(d) Pardon a murderer sentenced for a crime committed in the District of 
Columbia. 

(e) Declare state laws unconstitutional. 


Many inconveniences arise in attempting to build tests of this type 
according to traditional directions. Despite that fact, it appears 
likely that the item has much merit, in view of the favorable treatment 
the literature accords it. Haven and Copeland‘ present statistical 
considerations which indicate that this is definitely superior for meas- 
urement purposes, from the validity standpoint, to the ordinary 
one-choice multiple response test. 

Only one experimental study is reported. Scheidemann® (appar- 
ently without reference to the directions given by early writers) 
prepared a four-response test, only one response in each exercise being 
correct. The test was dive twice to a group of sixteen college psy- 
chology students. The first time, directions were given to mark all 
correct responses; the second time, to mark only the one correct 
answer. She reports that, insofar as her small group warrants gen- 
eralization, the plural-choice form of test item has marked advantages, 
apparently ‘‘the same as those obtained from lengthening a test.’ 

628 





Multiple True-false Test Exercise 629 


A logical analysis of the exercise has developed a set of suggestions 
to simplify and improve the traditional directions. It is believed 
that their application can greatly enhance the usefulness of the test. 

It is necessary for the student, in responding to an item of this 
type, to examine each alternative separately and make a “‘yes-no”’ 
decision with respect to it. In other words, this is not a variant of 
the multiple-choice test, but is rather a set of related true-false items. 
For this reason, it is referred to in this paper as the “‘multiple true- 
false”’ test. 

It seems quite possible that treating this item as analogous to the 
multiple-choice exercise has led to most of the defects referred to above. 

The traditional point of view, which usually sought underlining of 
each correct choice, made possible no discrimination between an 
omission and a definite “‘no” decision on a subitem. If we are to 
give credit for all work done correctly, we should give credit for correct 
negative decisions. The only apparent means of obviating this 
difficulty is to use true-false symbols: + for correct alternatives, 0 for 
false responses. Doubtful, or unfamiliar, items should be left blank, 
in accord with the evidence that ‘‘do not guess” instructions increase 
validity. This procedure has the added value of increasing the 
number of responses and should improve reliability. 

An additional advantage lies in eliminating cumbersome scoring 
methods. Orleans and Sealy! have been the only writers to suggest 
correcting for guessing in this test. They note that, if four responses 
out of eight are correct, and a student marks six indiscriminately, he 
is certain to get two, and maybe four, of the correct answers. As a 
possible solution, they consider and reject as impractical the plan of 
presenting four or five times as many confusions as “true” items. 
The only alternative they note is the clerically burdensome scheme of 
scoring one point for each correct underline, and deducting one for 
each underline incorrectly made. 

This problem is immediately simplified, however, if choices are 
presented in a vertical alignment. The student makes his + and 
0 marks in front of the choices, and, appearing as they do in a straight 
line, they can be easily marked by an ordinary true-false key. Scoring 
by the right-minus-wrong formula is then most easy. This eliminates 
the need for the troublesome cutout or transparent stencil for correcting. 

Another point involved in construction of these tests has been 
neglected by writers. Experimentation is necessary to determine the 
most satisfactory total number of responses offered per item. We 








630 The Journal of Educational Psychology 


should also have a guide to the proper proportion of true and false 
subitems in each group. It seems reasonable to set up the hypothesis 
that the percentage of “‘true” responses in an exercise should vary 
from zero to 100, in accord with chance distribution, throughout the 
test. This means that there will be a large number of items divided 
half and half between true and false responses, but makes it impossible 
for a student to find a clue to his answers in that expectation. 

This also implies that the total number of true responses in the test 
should equal the number of confusions employed. Under the more 
usual system, it has been customary to ask for a “‘yes-no”’ decision 
with the chance as much as four or five to one in favor of the “‘no.”’ 

With these modifications, the test can fill a definite place in the 
tester’s equipment. Particularly, are items of this sort serviceable in 
measuring knowledge of generalizations or classifications. One exer- 
cise of the multiple-true-false type can measure knowledge of a concept 
or technical term as well as five exercises like those now employed in 
vocabulary testing. 

This test is also particularly adapted to examining on knowledge of 
related or similar facts, as evidenced in the examples given above. 


SUMMARY 


(1) The so-called plural choice test exercise is more nearly a true- 
false than multiple-choice type of test. In view of this, the name 
“multiple true-false” is suggested. 

(2) Directions to underline correct choices should be replaced by a 
system requiring the student to differentiate between choices recog- 
nized as false, and choices of which he is uncertain. 

(3) Arranging choices vertically, and using the symbols + and 0, 
as in a true-false test, is suggested. 

(4) The proportion of true responses in each exercise should vary 
throughout the test, approximately in accord with chance distribution. 

(5) The number of ‘‘true” responses in the entire test should be 
approximately equal to the number of confusions. 

All these suggestions are, like all a priori reasoning, in need of 
experimental verification. In view of the potential contribution of 
the multiple true-false test type, it appears a worth-while field for 


further investigation. 


REFERENCES 


1, Orleans, J. S., and Sealy, G. A.: Objective Tests. Yonkers: World Book Com- 
pany, 1928, pp. 223-226. 


2. € 
3. I 
4, I 


5. & 


6. | 








Multiple True-false Test Exercise 631 


2. Odell, C. W.: Traditional Examinations and New-Type Tests. New York: 
Century, 1928, pp. 310-316. 

3. Lang, A. R.: Modern Methods in Written Examinations. Boston: Houghton- 
Mifflin, 1930, p. 143. 

4. Haven, 8. E., and Copeland, H. A.: “A note on the ‘multiple choice’ test.” 
Journal of Applied Psychology, Vol. xv1, April, 1932, pp. 219-221. 

5. Scheidemann, N. V.: ‘‘ Multiplying the possibilities of the multiple choice form 
of objective question.” Journal of Applied Psychology, Vol. xvu, June, 1933, 
pp. 337-340. 

6. Ruch, G. M.: The Objective or New-Type Examination. New York: Scott, 
Foresman, 1929, pp. 318-357. 





a f } 
fe lle. ein 


A PRELIMINARY NOTE ON THE VOCABULARY TEST 
IN THE REVISED STANFORD-BINET SCALE, FORM L 


MARY ISABEL ELWOOD 
Pittsburgh Public Schools 


It is a little too early to make definite statements on the relative 
values of the 1916 and the 1937 Stanford-Binet. However, since the 
vocabulary test has been the most criticized of all the subtests of the 
old Stanford Revision, a preliminary note on the new vocabulary 
scores may not be out of place. 

McFadden,? Phillips,* and Wallin‘ found the vocabulary test to be 
one of the most difficult, if not the most difficult, of the subtests of the 
1916 revision at the age levels they studied. This was even more true 
for the Pittsburgh school children studied by the present author. 


Tasie I.—CA, MA, IQ, anp VocaBuLary Scores OF ELEVEN HunpRED Srxty-onz 
PirrspurGH ScHooLt CHILDREN 








Range Mean Sigma 
EDR rete see ee akcea 5-7to 19-2 | 10-11.9 2- 9.24 
a re Oe a uae 3-7 to 21-9 9 1.0 3-11.29 
RSS eo een mee 41 to 178 85.45 24.3 
ire ies kG ae cadoe Oto 37 9.75 6.69 














The new scale, with its shortened vocabulary list and lowered 
quantitative standards of success for the various age levels, would 
appear to correct this difficulty. The present preliminary study has 
been undertaken in order to throw some light on the question of 
whether it has corrected the former error or whether it may have 
over-corrected it. 

The subjects included in the study are all the Pittsburgh school 
children tested during the school year 1937-1938 whose mental age 
reached or exceeded the four-year level. (The midpoint of the age 
level has been taken at the year, so that the level studied has included 
mental ages from six months below to six months above the year.) 
‘The tests were all given and scored by the same examiner, so procedure 
has been held constant, Table I shows the composition of the group. 
It will be seen from this tabulation that the group is far from homo- 
geneous and that the distribution is not a normal one. This need not 


invalidate the study, however, since many of those published on the 
632 














Vocabulary Test in the Revised Stanford-Binet 633 


1916 revision did not use normal distributions. The four-year mental 
age level was chosen as the lower limit, because it was the first level at 
which vocabulary scores were obtained (although the first level at 
which the vocabulary carries credit is year VI). 

The correlation between mental age and vocabulary score is high, 
the actual Pearson r being .978, with a probable error of .0009. 

This, of course, does not answer the question raised at the beginning 
of the study, since the vocabulary score could correlate very highly 
with the mental age even if the test were scaled either too high or too 
low. For this reason the vocabulary scores in the mental age groups 
between year IV and year XI have been studied separately. The 
upper mental age ranges are not used, because of the small number of 
cases tested at each level. Table II shows the result of this study. 
The mean vocabulary scores for years VI, VIII, and X are very close 
to the credit standards for those years, and the percentages of success 
at these levels, as well as at those just preceding and following them, 
give further indication that the norms are approximately correct for 
the children used in this study. 


TaBie II.—VocasuLary Scores aT MA’s IV To XI 

















MA IV; V VI | VII} VIII | Ix x XI 
<n oti an 4. ou 0% 56 97 190 175 158 111 68 55 
Vocabulary range........ 0-7] 0-7 | 1-9 | 2-10] 4-13 | 5-15} 7-16 | 8-18 
Mean vocabulary........ 1.57) 3.49 | 4.82 | 6.79) 7.92 | 9.18) 10.81 |12.53 
Per cent of success at VI. ./5.36/29.80 60.53| 95.43) 98.73 | 100 | 100 | 100 
Per cent of success at VIII|....|..... 2.12 {28.00} |63.29| |4.68] 97.06 | 100 
EES) GE Se es eee eee een a see 83. 
Livia: Ws.u seed psn [48 . 53| |83.64 
Ee, ocak sence focevcl sacce 1.80) 10.29 |32.73 
UE UNO OG TREY foes cheicccol cccce Jocccel cecce fesecs 5.88 |12.73 





























Table III is included in order to show a rough comparison between 


these percentages of success and those found by the same examiner in 
an earlier study on the 1916 revision.!. For this comparison, only the 
VIII and X year vocabulary tests are used, because the 1916 revision 
did not score vocabulary at year VI. The figures tabulated are not 


strictly comparable, because different children were used in the two 
studies, and the numbers were larger in the earlier study. However, 
the cosmopolitan type of the population from which the two samplings 





{ 634 The Journal of Educational Psychology 























; 
are taken has not changed materially since the earlier tabulation ; and, 
if while the comparison can not be considered exact, it is at least 
<4 interesting. Fa 
th | Taste III].—Comparison or Successes oN VocABULARY Tests aT YEARS VIII 
i AND X ON THE 1916 AND THE 1937 REVISION 
¢ Per cent of | Percent of | Percentof | Percent of 
ne lsuccess at year|success at year|success at year|success at year 
wi VIII, 1916 VIII, 1937 X, 1916 X, 1937 vel 
a his 
3 ES Eta ons e's a 0.0 2.12 0.0 0.0 des 
, TN Ree 1.5 28.00 0.0 0.0 of 
iy BMA VIEI.........; 11.5 63 . 29 0.0 5.06 the 
; | hs SOR ae 53.5 84.68 0.8 19.00 
ah MAX....... ai 96.1 97.06 22.3 48.53 oft 
ee (01 
a This brief study is not presented as conclusive, because of the - 
4 relatively small number of cases and because of the skewed character oe 
Ey of the distribution. However, it does offer an indication that the new re 
iM revision, at least through year X, has corrected, without overcorrect- we 
ae ing, the relatively too great difficulty of the 1916 vocabulary test. of 
ue BIBLIOGRAPHY ap] 
4 1. Elwood, M. I.: “‘A Statistical Study of Results of the Stanford Revision of the wil 
bi: Binet-Simon Scale with a Selected Group of Pittsburgh School Children.” Ch 
5, Pittsburgh Schools, Vol. rx, 1935, pp. 116-140. a 
44 2. McFadden, J. H.: “Differential Responses of Normal and Feeble-minded tes 
a Subjects of Equal Mental Age on the Kent-Rosanoff Free Association Test 
iq and the Stanford Revision of the Binet-Simon Intelligence Test.”” Mental ‘“ 
Measurement Monographs, No. 7, 1931. ‘ 
. 3. Phillips, A.: “An Analytical and Comparative Study of the Binet-Simon Test ma 
Response of 1306 Philadelphia School Children.” Psychological Clinic, edi 
Vol. xx1, 1932, 1-38. Ch 
if 4. Wallin, J. E. W.: ‘A Statistical Study of the Individual Tests in Ages VIII and Bir 
“4 IX in the Stanford-Binet Scale.” Mental Measurement Monographs, No. 6, th 
; 1929. 
| ne\ 
pre 
firs 
six 


- wh 





= 


- 





BOOK REVIEWS 


Frank N. Freeman. Mental Tests: Their History, Principles, and 
Applications. (Revised Edition), Boston: Houghton Mifflin 
Company, 1939; pp. 460. 


The plan of Freeman’s revision of his earlier successful text follows 
very much the same general pattern as its predecessor. A brief 
historical treatment of the development of mental tests is followed by a 
description of the techniques of test construction and a consideration 
of the interpretation of test results. As was true in the 1926 edition, 
the greatest emphasis is upon intelligence testing. Attention is called, 
often very briefly, to the measurement of various aspects of personality 
(one chapter) but most of the theoretical discussion as well as most of 
the practical illustrations relate to intelligence. 

If the assumption is made that Freeman’s earlier book is familiar to 
most psychologists and educators, and that assumption seems justified 
in view of its wide use, there is not a great deal to say about the revi- 
sion. The following six of the sixteen chapters appear without change 
of any significance: Chapter I, ‘‘ Present status, meaning, and fields of 
application of mental tests,” Chapter II, ‘Earlier experimentation 
with tests,” Chapter V, ‘“‘The early development of point scales,’”’ 
Chapter X, ‘‘Technique and theory of mental tests,’ Chapter XI, 
“Problems relating to the selection and organization of the items of a 
test,” and Chapter XII, “‘How to tabulate the results of tests.” 

Three other chapters, IV, ‘‘ Age scales: The Binet scales and their 
revisions,’ IX, “‘Technique and theory of mental tests: Subject- 
matter of tests and related problems,” and Chapter XIV, “The 
educational uses of tests” are changed in minor respects only. In 
Chapter IV a few pages have been added describing the new Stanford- 
Binet tests; Chapter IX includes three added pages on tests to measure 
the primary mental abilities, and Chapter XIV includes occasional 
new paragraphs and a page or so on recent evidence bearing upon the 
problem of ability grouping. | 

The major additions in the second edition in comparison with the 
first are as follows: 

1. An expansion of the treatment of ‘‘factor analysis’’ from one to 
six pages (Chapter III). 

2. Some nine pages devoted to a description of new point scales 
- which have appeared since 1925 (Chapter VI). 

635 














636 The Journal of Educational Psychology 


3. About twelve pages of descriptions of new aptitude tests— 
music, art, clerical, academic, etc. (Chapter VII). 

4. A rather long section (seventeen pages) on personality and 
character tests (Chapter VIII). 

5. Five pages calling attention to new methods of measuring incre- 
ments of mental growth (Chapter XI). 

6. A revised and up-to-date discussion (seven pages) of the con- 
stancy of the IQ,,and », new section (seven pages) on the relationship 
between conduct and intelligence (Chapter XIII). 

7. Approximately twelve pages bringing the discussion on the 
inheritance of intelligence up-to-date (Chapter XV). 

8. A completely rewritten chapter on ‘‘The nature of ability.” 

The reviewer studied Freeman’s first edition shortly after it 
appeared in 1926 and was impressed by the author’s ability to make 
judicious inferences from conflicting data. The relatively few changes 
in interpretation which appear in the revised edition are a tribute to a 
type of judgment which does not always characterize authors of texts 
in the field of psychological measurements. As is implied above, to 
those who were thoroughly familiar with the 1926 Mental Tests the 
revision will seem but slightly different. Certain sections have been 
added and one chapter completely rewritten but the reader gets no 
impression of a new text. Seventy five per cent of the pages have not 
been changed. 

To cover the entire field of Mental Measurement in a single volume 
has become too ambitious a task. This was not true fifteen years ago, 
but since 1926 so much has been done with respect to personality 
measurement alone as to render any attempt to cover that field even 
superficially in one chapter, an impossible assignment. In his revision 
Freeman has paid no attention to achievement testing as such, and it 
would probably have been all to the good had he limited himself to 
intelligence without making any attempt to introduce his readers to 
mechanical aptitude, the measurement of character and other concepts 
and practices which are outside the area in which he has made many 
major contributions. ’ SterHen M. Corey. 

University of Wisconsin. 


Dovatas Spencer. The Fulcra of Conflict. Yonkers, N. Y.: World 
Book Co., 1939, pp. 306. 


In spite of its esoteric title, this book represents a definite forward 
step in the complexities of personality measurement. Since the title 


| 





con 
out 
mes 
fact 
for 
atti 
ind 


the 


au 
OW 
en 
co 


cu 


BoB 





‘Id 


rd 
tle 





Book Reviews 637 


conveys little notion of the subject-matter, it is well to say at the 
outset that the book is a study of a new instrument for personality 
measurement. The test was formulated on the hypothesis that no 
fact of behavior and no part of experience has any certain meaning 
for personality measurement. It is only by comparison with the 
attitudes and behavior of others that it acquires meaning for the 
individual. To be of average intelligence, or to believe oneself to 
possess average intelligence may be satisfying or a source of conflict, 
depending on one’s ideals and the intelligence and attitudes toward 
intelligence of one’s family and associates. It is these other factors, 
“the means through which the experience exerts influence for conflict ”’ 
that are christened the “fulcra”’ of conflict. 

With this as a starting point, Spencer creates his test. In essence 
his instrument provides more adequately than ever before for the 
comparison of the adolescent’s self-rating with his rating of his own 
ideals, his parents’ attitudes and behavior, and the behavior of his 
associates. ‘There are fifty-three items in the test each repeated seven 
times with appropriate changes in the wording to make them refer to 
the subject’s behavior, his ideal of behavior, his mother’s behavior, his 
mother’s ideal, father’s behavior, father’s ideal, and his associates’ 
behavior. The items cover well chosen fields, based on Spencer’s 
experience in clinical work and psychological counselling. There are 
questions regarding social life, home relationships, personal character- 
istics, conduct (including a few questions related to sex conduct), and 
school relationships. These questions were administered to two 
hundred high-school students, who were admirably prepared for the 
task to ensure a maximum of frankness. Precautions were taken so 
that while the tests were unsigned and essentially anonymous, infor- 
mation regarding age, IQ, etc. was available for each child. 

It is not surprising that the major findings of the study confirm 
the hypothesis. After exhaustive attempts to measure reliability 
(largely by retest) and validity (largely by internal consistency) the 
author gives the data which indicate clearly that identical ratings of 
own behavior or attributes may have very different meaning to differ- 
ent individuals, representing no conflict in one instance, extreme 
conflict in another. 

Not the least valuable part of the work is the thoroughgoing dis- 
cussion of the philosophy as well as the techniques of personality 
measurement which constitute the first three chapters. This will be 
required reading for all those who plan new efforts in this field. The 
inadequacies of trait measurement, the over-use of statistical tech- 








638 The Journal of Educational Psychology 


niques, the lack of sound thinking and sound formulation of hypotheses 
for testing are forcefully presented. 

To this reviewer Spencer’s discussion of personality conflict is 
definitely disappointing. Profound consideration of this problem, by 
those oriented to clinical work, is badly needed. Unfortunately the 
author contents himself with quoting others and then comfortably 
defines conflict for the purpose of his research as the discrepancy 
between the subject’s views of his own behavior and his beliefs regard- 
ing the various “‘fulcra.” This is too superficial a view. 

Another disappointment to clinicians is the meager practical out- 
come of the research. By additional questions Spencer finds that 
more than fifty per cent of the group would have answered questions 
untruthfully if the test had not been anonymous. Those with the 
most conflict show the greatest tendency toward distorting their 
responses, thus casting considerable doubt on the use of this type of 
procedure with clinical subjects of adolescent age. 

A relatively untapped field of discussion in the book is the wealth 
of revealing data as to adolescent attitudes in a fairly typical high- 
school group. Attitudes toward social life, smoking, drinking, petting, 
religion, school authorities, and the like, are not only revealed, but 
compared and contrasted with the subject’s beliefs about parental 
attitudes and behavior on these same topics. This material is of much 
greater value, because more carefully fobtained, than many studies 
whose primary purpose has been to probe the viewpoint of youth. 

In general the book places a needed emphasis on a soundly com- 
parative approach for determining stress and strain within the indi- 
vidual. It is to be hoped that on the basis of Spencer’s findings 
personality tests will be developed which can bring forth truthful 
responses, and yet concern themselves with these important conflict 
situations, rather than with hypothetical traits. Carn R. RoGers. 

Rochester Guidance Center, Rochester, New York. 


Freperick H. Lunn. » Emotions: Their Psychological, Physiological, 
and Educative Implications. New York: Ronald Press, 1939, 
pp. 305. 

Wituiam H. Mixesett. Mental Hygiene. New York: Prentice- 
Hall 1939, pp. 456. : 

Harry Roserts and M. N. Jackson. The Troubled Mind. New 
York: Dutton and Co., 1939, pp. 284. 


Emotions, prepared under the auspices of a special committee of 
the American Council on Education, presents a concise summary of 





the 


pre 
ch: 


gle 


are 
an 





of 
of 


Book Reviews 639 


the principal techniques employed in the field of emotions and the 
present status of experimental findings in this field. Five of the nine 
chapters are given over to an analysis of data bearing on the neuro- 
glandular basis of emotional reactions and the physiological changes 
associated with various emotional states. The remaining chapters 
are concerned with the identification of emotions, the development 
and control of emotions, and the inter-relationship of emotions and 
motivation. Lund has skillfully blended the numerous studies included 
in this review into an informative and highly readable book that will 
undoubtedly have extensive usage both as a source book and a text. 

The format of Mental Hygiene is that of a text, but the contents 
do not fully support the external appearance. It is essentially a 
semi-popular book on self-improvement offering objective and, for 
the most part, psychologically sound advice on such topics as over- 
coming fear and worry, increasing personal efficiency, and controlling 
the forces of suggestion. 

Written in a popular vein for the intelligent layman, The Troubled 
Mind briefly surveys the fields of personality adjustment and mental 
disease. Both with respect to literary merit and scientific accuracy, it 
is superior to the usual run of books in this field. No simple formulae 
for mental health are presented. The authors, apparently British 
physicians, content themselves with presenting a simplified digest of 
current psychopathology. James D. PaGe. 

The University of Rochester. 
Counselor, San Diego (California) City Schools. 


J. Etsenson. The Psychology of Speech. New York:.Crofts, 1938, 
pp. 280. 


This book will be welcomed by the teacher of introductory and 
social psychology as one to which students interested in the psychology 
of speech and its development may profitably be referred. In the 
words of the author: ‘This book attempts to present the principles of 
psychology which underlie the problems of speech.”” The contents of 
the book are not such as to make it suitable for a longer course, and 
advanced students in psychology and allied fields will probably find it 
only mildly stimulating. 

The book is divided into five sections. The first deals with the 
nature of speech and its origin in the race. Section two considers 
basic psychological aspects of speech, treating in turn the nervous 
mechanism and speech, emotion and speech, the psychology of learn- 





ek 








640 The Journal of Educational Psychology 


ing as applied to speech, the psychology of meaning, and speech and 
thought. Section three is devoted to the development of speech and 
language in the child, and includes a chapter on disorders in the speech 
of children. Personality and speech are treated in section four. A 
chapter on personality deviations and speech is included in this sec- 
tion which, in the reviewer’s opinion, is deserving of especial note. 
Consideration is devoted to language and speech development in the 
blind, the stutterer, the maniac depressive, the schizophrenic, and the 
aphasic. Students of abnormal personalities will find this chapter 
very interesting. Part five deals with the psychology of the audience. 
The material in this section is not as well handled as that of preceding 
sections. It gives the impression of being more inspirational than 
factual. 

The reviewer’s candid opinion would not hold this a significant 
treatment of the field, but to certain selected groups it should prove to 
be a most useful supplementary text. J. M. Porter, Jr. 

Carnegie Institute of Technology. 























an onal ~~ A s. an ® o 





